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THE WORLD SERIES COMPETITION* 


FREDERICK MOSsTELLER 
Harvard University 


Suppose a number of pairs of teams or products are com- 
pared on the basis of n binomial trials. Although we cannot 
know from the outcomes which teams or products were ac- 
tually better, we wish to estimate the average probability that 
the better team or product wins a given trial, and thus to meas- 
ure the discrimination provided by our test. World Series data 
provide an example of such comparisons. The National League 
has been outclassed by the American League teams in a half- 
century of World Series competition. The American League 
has won about 58 per cent of the games and 65 per cent of the 
Series. The probability that the better team wins the World 
Series is estimated as 0.80, and the American League is esti- 
mated to have had the better team in about 75 per cent of the 
Series. 


INTRODUCTION 


F WE compare pairs of teams, products, drugs, or persons on the 

basis of a fixed number of binomial trials, and identify the member 
of the pair that wins the majority of trials as the better, we may be in 
error. For example, if the members of a pair are evenly matched, a 
decision on the basis of performance is equivalent to coin-flipping. On 
the other hand, if one of a pair is actually better, it is more likely that 
the better member will also be the winner. If we carry out such com- 
parisons on many pairs under roughly comparable conditions, it is of 





* This work was facilitated by support from the Laboratory of Social Relations, Harvard Univer- 
sity. 

It is a pleasure to acknowledge the numerous suggestions and criticisms made by various friends, 
though I have not availed myself of all their advice. Harry V. Roberts (University of Chicago) and 
Howard L. Jones (Illinois Bell Telephone Company) were so prodigal with suggestions that the paper 
has more than doubled in length since its first draft. I wish to thank K. A. Brownlee (University of 
Chicago), William G. Cochran (Johns Hopkins University), Herbert T. David (University of Chicago), 
Joseph L. Hodges (Universities of Chicago and California), and William H. Kruskal (University of 
Chicago) for their reading of the manuscript in early draft. My greatest debt is to Mrs. Doris Entwisle 
(Harvard University) who assisted with the computations, with gathering the data, and with the prepa- 
ration of the manuscript. 
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interest to estimate the over-all effectiveness of the decision technique 
in the kinds of situations that have occurred in practice. Data from 
the World Series are available to illustrate many facets of this type of 
problem. 

About World Series time each year most fans are wondering which 
team will win. The author is no exception, but he has also wondered 
about another question: Will the Series be very effective in identifying 
the better team? 

By assuming that the probability that a particular team wins single 
games is fixed throughout a series, we can readily illustrate our point. 
Suppose team A and team B are matched in a series, and that the 
probability that A wins single games is p=0.52. Then team A is the 
better team, because it has the higher probability of winning single 
games, but in a short series team A may have little better than a 50-59 
chance of winning. For example, in a 3-game series, team A can win 
by winning the first two games, the first and third, or the second and 
third. The probabilities of these three outcomes are (0.52)?, (0.52) 
(0.48) (0.52), and (0.48)(0.52)?, respectively. These values add up to 
0.529984, or about 0.530, which means that team A’s chance to win 
the 3-game series is not a very big improvement over its chance to 
win a coin-flipping contest. The corresponding computation for a seven 
game series shows a small improvement in the probability that team 
A wins; the value is about 0.544. In principle, the longer the series, 
the better chance team A has to win. But practical considerations al- 
ways limit the lengths of such comparison series. 

In its simplest form, the question of the effectiveness of series in 
identifying the better team can be raised with respect to League play- 
offs. The American League uses a one-game play-off, while the National 
League uses a three-game play-off to settle first-place ties, and thus to 
choose their representative in the World Series. Intuitively one might 
suppose that the National League’s three-game play-off would be more 
sensible, because the longer the series, the better chance the more skill- 
ful team has to emerge the victor, as the example in the previous para- 
graph suggests. On the other hand, if the teams have played 154 games 
to a dead heat, there is considerable evidence available to suggest that 
these teams’ chances of winning single games are roughly equal, and 
therefore that the flip of a coin will be nearly as sensitive in deciding 
which team is better as the actual play of an additional one- or three- 
game series. An exception to this might occur when one of the tied 
teams had been improving as the season progressed, while the play of 
the other was falling off. Then the improving team would be expected 
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to benefit from the longer series on the average, because its probability 
of winning single games would be larger than 0.5. 

As a simple model for the League play-offs, we might suppose that 
one team has probability p of winning single games, and that the other 
has probability 1—>p and further, that p remains the same for all games 
in the series under consideration. The reader may easily think of argu- 
ments against the assumption of the constancy of p from game to 
game (pitchers and ball parks are examples of variables that could 
contribute to variation in p). We will provide some evidence on this 
issue later. Corresponding to the probability p of winning a single game, 
we need the probability of winning an n-game series. We will call this 
probability S(p, n). By an n-game series we mean a series that can last 
n games. Thus n=7 in recent World Series, though the number of 
games played may be as few as four. And, in general, an n-game series 
may be stopped considerably short of n games because play stops as 
soon as one team has won a majority. In acceptance sampling language, 
we would call this “truncated single sampling.” Although the point is 
obvious, we note that S(p, n) is the same whether the series is played 
to the full n games, or only played until a majority is won by one team. 
This is clear because no decisions about the name of the winner are 
changed once a particular team has a majority. This fact is useful be- 
cause it means S(p, n) can be computed directly from the binomial 
expansion as if all games had been played, instead of from the less 
familiar and less well tabulated distribution appropriate to truncated 
single sampling. With this “fixed p” model, slight deviations from equal 
probabilities of winning single games will lead to very little gain in the 
probability that the better team wins in a three-game play-off as com- 
pared with a one-game play-off. If the probability of the better team 
winning a single game is somewhat more than half, say p=}+e, 
(42e>0), then the probability it wins a three-game series is 


S(p, 3) = p? + 3p°(1 — p) = p*(3 — 2p) 
= (3 + €)*(2 — 2e) 
= 3+ fe — 2’, 


and the increase in the probability of correctly choosing the better 
team as we go from the one-game to the three-game play-off is 


S(p, 3) — S(p, 1) = 2 — 2¢. 


If e=0.01 (p=0.51), the gain in the probability S is essentially 0.005 
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for a three-game as compared with a one-game series. This means that 
in 200 one-game play-offs, the better team could expect to win 102 
play-offs, but in 200 three-game play-offs, the better team could expect 
to win 103 play-offs, a scarcely noticeable improvement. There are few 
data for comparing play-off series within the major leagues, because 
ties are rare. Further, one-game play-offs would not provide any in- 
formation on the point at hand. We will not therefore be able to pursue 
this question of differences between play-off teams to any conclusion, 
and so we proceed directly to consideration of World Series, where 
data are more plentiful. As a simple means of examining the power of a 
series for identifying the better team we provide Table 1. In Table 1 
(suggested by K. A. Brownlee) we give the probability S(p, n) that 
the better team wins an n-game series for n=1, 3, 5, 7, 9, and for 
various probabilities p. We also give the probability that the better 
team wins, or ties, in even-sized series n=2, 4, 6, 8. 


COMPARISON OF MAJOR LEAGUES 


One way to approach the question of equality of ‘cams entering the 
World Series is to compare the two Leagues. Altogether there have 
been 44 seven-game Series from 1905 to 1951, and 4 nine-game Series 
(1903, 1919-1921, no Series in 1904), for a total of 275 games actually 
played. The American League has won 159 of these games, or 57.82 
per cent. In any year 7, the American League team is assumed to have 
had a probability p; of winning single games. Then over the 48 years 
of World Series games there would be an average probability, say 
p= >.p;/48. If we adopt this view, then the average proportion of 
games won, 0.5782, is an estimate of p. Some might object to this esti- 
mate of p because the sampling is truncated (a seven-game Series is 
stopped when one team wins 4 games), or because we would have a 
better estimate if we averaged over the number of Series rather than 
the total number of games. 

These objections are both reasonable, but it happens to turn out that 
the use of an estimate suited to truncated sampling gives almost exactly 
the same numerical result. We introduce this estimate because we plan 
to base a later argument on it. An unbiased estimate appropriate to 
truncated sampling can be obtained in the following manner:! 

In a year when the American League wins, the estimate of p is taken 
to be 


1M. A. Girshick, F. Mosteller, and L. J. Savage, “Unbiased estimates of certain binomial sampling 
problems with applications,” Annals of Mathematical Statistics, 17 (1946), 20. 
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where c is the number of games it takes to win a Series and z is the 


TABLE 1 
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number won by the National League; when the American League loses, 
the estimate is 


entails 
c+y-1 
where y is the number won by the American League. In our data the 


value of c is either 4 or 5 depending on whether a seven- or nine-game 
Series is used. The summary table (Table 2) shows the Series outcomes 
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and the estimates corresponding to these outcomes. In addition, the 
four estimates of American League p; for the nine-game Series are 4/7, 
3/7, 4/6, 3/7. The average of the 48 estimates is 57.80 per cent, pro- 
viding an estimate of 100 p that is scarcely different from the more 
naive per cent won. If we think of the team representing the American 
League as having a possibly different p; in each Series, then 57.80 per 


TABLE 2 
OUTCOMES OF THE 44 SEVEN-GAME SERIES* 








Games Won Estimate of P 


for A.L. 





Frequency 
N.L. A.L. 





0 
1/4 
2/5 
3/6 
3/6 
3/5 
3/4 


1 


a 


or NWP PP 
; ~~ 
ooo r 81 = » 


PP PO De © 


Total 44 





* Data from The World Almanac 1952, New York World-Telegram, Harry Hansen (Ed.), p. 821. 
These data have also been checked in The Oficial Encyclopedia of Baseball, Jubilee Edition, dy Turkin 
and 8. C. Thompson (New York: A. 8. Barnes Co., 1951). 


cent is, in a reasonable sense, an estimate of the average probability of 
winning single games over the years. 

Another question is whether the American League has done signifi- 
cantly better than the National League. We could check, merely on 
the basis of the number of Series won. The American League has won 
31 of 48 Series, and under the null hypothesis of p=}, the proba- 
bility? of 31 or more successes is 0.0297, or a two-sided probability of 
about 0.06. 

Although it has little to do with the main discussion, another fre- 
quently-asked question is whether the American League has been im- 
proving through the years. To answer this question, at least partially, 
we break the 48 Series into four sets of 12, chronologically in Table 3. 
There is a slight but not statistically significant trend in the data. Of 





2 Tables of the Binomial Probability Distribution, National Bureau of Standards, Applied Mathe- 
matics Series 6 (U. S. Government Printing Office, 1950), p. 375. 
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TABLE 3 


NUMBER OF SERIES WON BY THE AMERICAN LEAGUE 
IN 12 YEAR INTERVALS 








Sees 1903- 1916- 1928- 1940- Total 
- 15* 27t 39 51 


Series Won by A.L. 7 7 9 8 31 of 48 





* No Series in 1904. 
t Includes National League victory in 1919, year of “Blacksox Scandal.” 


course there is one notable trend by the New York Yankees (A.L.) 
suggested by Table 4. Just what sort of significance test should be ap- 
plied to a team chosen on the basis of its notable record, is an issue not 
at present settled by statistical theory, so we leave Table 4 without 
further analysis. 


TABLE 4 


TABLE OF WORLD SERIES WON AND LOST BY 
YANKEES BY YEARS 








Years Won Lost Totals 





1903-27 2 3 5 
1928-51 11 1 12 
Totals 13 4 17 





ESTIMATING THE PROBABILITY THAT THE BETTER TEAM 
WINS SINGLE GAMES 


Since the American League has done rather well through the years, 
we will reject the idea that the league champions are equally matched 
(p=4) when they appear for the Series. How closely are they 
matched? Suppose each year we knew the single game probability p; 
for the better team, then the average of such p’s could be a measure of 
how well the teams were matched. We expect the average p to be 
greater than 0.5. Indeed in the present case our estimate for the average 
p for better teams should be higher than the 57.8 per cent we obtained 
for the American League, because we suspect that the American League 
team was in some years not the “better team.” There should be no 
confusion here between the “winning team” and the “better team.” 
The “winning team” is the team that wins the Series. The “better 
team” is the team with the higher probability of winning single games, 
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whether or not it actually wins the Series. We anticipate that the better 
team sometimes loses a Series, just as the league champion loses single 
games to the last-place team within the league during the season. 
Model A: The better team has the same p each year. The author has 
not discovered any good way of estimating the average p for better 
teams without making further unrealistic assumptions (the lack of re- 
ality may not be important, because the results may not be sensitive 
to the assumptions). What has been assumed in this section is that 
every year the better team has the same probability p>4 of winning 
single games. Of course, we cannot identify the better team in any 
particular Series, but we may by arithmetic manipulation derive an 
estimate from our half-century of data. For this purpose, we will neglect 
the four nine-game Series, because they cause considerable arithmetic 
trouble. Our data and Model A can be summarized in Table 5. The 


TABLE 5 
GAMES WON (SEVEN-GAME SERIES ONLY) 











. Theoretical 
Winner Loser Frequency Proportion 
. 0 9 p'+q' 
4 1 13 4p'q + 4pq'* 
4 2 11 10p‘g? +10 p*q* 
4 3 11 20p‘g* +20p*q* 
Total 44 1 





algebraic expressions in the right-hand column are not the usual terms 
in the expansion of the binomial (p+q)’ because we are working with 
truncated single sampling. On each line the first algebraic term repre- 
sents the probability that the better team wins the Series in the number 
of games appropriate to the line, while the second term similarly repre- 
sents the probability that the poorer team wins the Series. The sum of 
these two terms represents the total probability that the Series is won 
in the pattern of games given in the first two columns. Thus in the 
third line 10p‘g* is the probability that the better team wins the Series 
in exactly six games. 

If we represent a win by the better team as a B and a win by the 
poorer team as a W, the following 10 ways for the better team to win 
in exactly 6 games exhaust the possibilities: 
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BBBWWB BWBWBB 
BBWBWB WBBWBB 
BWBBWB BWWBBB 
WBBBWB WBWBBB 
BBWWBB WWBBBB 


These 10 ways correspond to the coefficient 10 in 10p‘g?. The factor 
p‘g’ arises from the fact that we must have exactly 4 wins by the better 
team, whose single game probability is p, and exactly 2 wins by the 
poorer team whose single game probability is g=1-—p. Similar com- 
putations account for the coefficients and powers of p and g correspond- 
ing to the other Series outcomes. 

To estimate p for better teams we should not take merely the total 
number of games won by Series-winning teams and divide by the 
grand total of games. This will clearly overestimate the p-value, as an 
example with p=} will show. 

In Table 6 based on hypothetical data for equally matched teams, it 


TABLE 6 
EXPECTED RESULTS FOR 64 SERIES, p=} 











Seiten Won pe 
requency 
4 0 8 
4 i 16 
4 2 20 
4 3 20 
Total 64 





turns out that the Series-winning teams won 256 of a total of 372 
games, or 68.8 per cent of the games—but an estimate of p=0.688 
would be rather far from the actual p=0.500. On the other hand, in 
the actual Series results, Table 5, the per cent of games won by the 
Series-winning team is only 72.1 (176 of a total of 244) which seems 
rather close to 68.8, so perhaps the previous assumption that the ceams 
are unevenly matched is not in line with the facts. To investigate the 
facts, we need an estimation process. Three estimates seem reasonable: 

1) Use the theoretical distribution to obtain a formula for the ex- 
pected number of games won by the losing team in a 7-game series. 
This average will be a function of p. Equate this theoretical average to 
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the observed average and solve for p. This is the method of moments 
applied to the sample mean.® 

2) Obtain the maximum likelihood estimate of p. 

3) Obtain the minimum chi-square estimate of p. 

Method 1) is by far the easiest computationally. 

Method 1. We wish to obtain the average number of games won by 
the Series-losing team in terias of p. We multiply the theoretical pro- 
portions from Table 5 by the number of games won by the loser and 
add. This operation gives 


A = Average number of games won per Series by the Series-loser 
(1) = = 1(4ptg + 4pq*) + 2(10p*g? + 10p’q*) + 3(20p‘g* + 20p%¢") 
= 4pq[(p* + q*) + 5pq(p? + gq?) + 15p*q’]. 
We note that 


p* + g* = p* + 3p’q + 3pq? + g* — 3p’q — 3pq* = 1 — 3pq 

and 

p> + g* = p* + 2pq + g’ — 2pg = 1 — 2pgq. 
Substituting these relations in (1) gives 
A = 4pq[1 — 3pq + 5pq(1 — 2pg) + 15p*q?] 
A = 4pq[1 + 2pq + 5p*q’]. 
The value of A attains its maximum when p=} as we would antici- 
pate. In the 44 seven-game Series, the average number of games per 
Series won by the Series-loser was 1.5455. If we set this equal to A in 
equation (2) we can solve directly a cubic in pq and then a quadratic 
in p, or we might go directly to the 6th degree equation in p. Modera- 
tion and discretion suggest that we just substitute a few values of p in 


the expression, and see what values of p lead to outcomes close to the 
average wins of the Series-loser. We get 


(2) 





* It was suggested by William Kruskal (personal communication) that other estimates based on the 
method of moments seem equally plausible. He suggests as an example the statistio 


Number of games won by loser 
Number of games in Series 


Calculations similar to those in the text give the average value of this statistic as 





7 10 
B ~4n| +304 +o | 


When this is equated to its observed average (about 0.25), the estimate of p turns out slightly higher 
than 0.65. 
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A=Average Wins 
Expected by Loser 


0.5 1.8125 
0.6 1.6973 
0.6500 1.5596 
0.6667 1.5034 
0.7 1.3780 


Linear interpolation gives the estimate of p as 0.6542. Of course, the 
uncertainty of the estimate makes the use of this many decimal places 
quite misleading. On the basis of the evidence thus far available, then, 
the teams entering the World Series seem to be matched at about 65- 
35 for single games. 

Method 2. If P(0), P(1), P(2), P(8) are the probabilities that the 
Series-losing team wins 0, 1, 2, or 3 games respectively in a Series, then 
the maximum likelihood approach involves finding the value of p that 


maximizes 
[P(0) }*[P(1) }*[P(2) }"[P) }". 


The numbers 9, 13, 11 and 11 are the frequencies tabulated in Table 5 
and the P(x) are given in algebraic form in the Theoretical Proportions 
column in Table 5. Although tedious, this maximization was done, and 
the estimate obtained was 0.6551, encouragingly close to that obtained 
from Method 1. 

Method 3. Finally the chi-square to be minimized was 


_ 44P(0)|* [13 —44P(1)]? [11 — 44P(2)]? 
~ —- 44?P(0) 44P(1) 44P(2) 
[11 — 44P(3) ]* 
44P(3) 
where P(x), x=0, 1, 2, 3 has the same definition as before. Here the 
terms 44P(z) are the “expected numbers” for the usual chi-square 
formula. The p-value minimizing this chi-square turned out to be 


0.6551. 
The following table summarizes the results for the three methods. 











Method Estimate 
Average Wins by Series Loser 0.6542 
Maximum Likelihood 0.6551 
Minimum Chi-square 0.6551 


Presumably the close agreement between the minimum chi-square 
method and the maximum likelihood method is partly an accident of 
the particular empirical data, and partly owing to 44 being a fairly 
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large number. For contingency problems equivalent to the present one, 
Cramér* points out that “the modified chi-square minimum method 
is... identical with the maximum likelihood method.” However the 
“modified chi-square minimum method” neglects certain terms of the 
form: 


[n. — nP(z) }? n 
Qn[P(z)]?= sts 


total number 





observed number of 2’s 


when the partial derivatives appropriate to minimizing chi-square are 
set equal to zero. We would not in general expect such a term to vanish, 
but the closeness of agreement suggests to the author that in the future 
he will usually prefer the easier maximum likelihood to the more tedious 
minimum chi-square when totals even as small as 44 are involved. 

Using the estimate obtained by the minimum chi-square method, the 
observed value of x? turns out to be 0.222, which for two degrees of 
freedom is fairly small, the probability of a larger value of chi-square 
being about 0.89, so the fit is rather good. 


HOW OFTEN DOES THE BETTER TEAM WIN THE SERIES? 


Based on past experience then, a reasonable estimate of the average 
probability of winning single games for the better team is about 0.65 
according to Model A. Using this value, we can compute the prob- 
ability of the better team winning a seven-game World Series as about 
S(0.65, 7) =0.80 (see Table 1), so the better team would win about 
four out of five Series. If we push our assumptions to an extreme we 
might even estimate that the American League has had the better team 
about 75 per cent of the time. We can obtain this number by assuming 
that the American League had the better team a fraction of the time z, 
and recall that the American League won 31 of 48 Series. Then equating 
expected and observed proportions of Series won we have: 


31 
0.80z + 0.20(1 — z) = B = 0.646 


0.607 = 0.446 
x = 0.743. 


Alternatively, we could use games won rather than Series won. Using 
our estimate of 0.65 as the single game probability for the better team, 





4 Harald Cramér, Mathematical Methods of Statistics (Princeton, N. J.: Princeton University Press, 
1946), p. 426. 
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and recalling that the American League has won 0.578 of the games we 
have: 


0.652 + 0.35(1 — x) = 0.578 
0.302 = 0.228 
x = 0.76. 


These two methods both give estimates of about 75 per cent as the 
percentage of years in which the American League has had the better 
team. If the American League has had the better team 75 per cent of 
the time, or in about 0.75(48) =36 Series, these 36 better teams could 
expect to win about 0.80X36=29 Series and lose about 36—29=7 
Series. Similar computations suggest that the American League has 
had the poorer team 12 times, and that these poorer teams could expect 
to win two Series from their better National League opponents. The 
discussion just given shows why we do not use the per cent of Series 
won (65) as an estimate of the per cent of times the American League 
has had the better team. The side more often having the better team 
will suffer most in actual play due to lack of discrimination of the 
7-game Series. If one League always had the better team, it would still 
lose a good many Series unless its single game p were quite high. 

It might be supposed that the estimate of 80 per cent for the prob- 
ability that the better team would win the Series would depend sensi- 
tively on the Model A assumption of a constant value of p for the better 
team in every year. The reader might be willing to accept the idea that 
our estimate of 0.65 as an average for better teams is reasonable, but 
feel that since there is surely a distribution of p values for better teams 
from year to year, the average of the S(p;, 7) may not be close to 
S(p, 7). (It will be recalled that S(p, 7) is the probability that a team 
with a single-game probability of p will win a 7-game Series.) For ex- 
ample, let p;=0.50, p2=0.90, and then the average p= (0.50+0.90)/2 
=0.70, while S(0.50, 7) =0.50, S(0.90, 7) =1.00, but S(0.70, 7) =0.87 
instead of 0.75=43(0.50+1.00). This example shows that 
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How important this objection is depends on the linearity of S(p, 7) 
over the dense part of the distribution of p’s from year to year. A 
graph of S(p, 7) reveals that S is approximately linear in p over the 
range from p=0.50 to p=0.75. We would suppose that in World Series 
competition it would be relatively rare that single game p’s exceeded 
0.75, and therefore we feel that the lack-of-linearity argument is not a 
strong one against this estimate of 0.80 as the average probability that 
the better team wins the Series. 


AN ALTERNATIVE METHOD OF ESTIMATION 


Model B. Fixed p’s within Series, but normally distributed p’s from 
year to year. There is another way of estimating the number of times 
the American League has had the better team. We can take the view 
that each year the American League team has a true but unknown 
probability of winning single games p;. For each of these yearly p; we 
have an unbiased estimate /;; the distribution of these estimates was 
given in Table 2. If we let the observed mean of the estimates be j, we 
can compute the sum of squares of deviations of the estimates /; from 
p. This sum of squares can be partitioned into two parts. One part has 
to do with o?(f;), the variation of /; around its true value p,, and the 
other with o?(p,), the variation of the true p; around their true mean p. 
Such a partition is standard practice in analysis of variance. We need 
to define 


o*(f;) = E( pi — pi)’, 


(3) > (p: — p)? 
o°(p;) = — 





where E is the expected value operator. We need the expected value of 


L(bi— 9)’, it is 
BS &- ] = 2 3 be - no] 


(4) — ae - 
= ef $92 - 2H"). 


t=1 





Recalling that 


E(p*) = o°(p;) + pi, 


5 
©) E(pibi) = pipi, 
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and using these results in (4) we have the well-known result expressed 
in words and symbols: 


Total Sum of = Within Years Between Years 
Squares Sum of Squares _ Sum of Squares 


& n—1 
(6) p> (di: — p)* = oe dD. 07(pi) + no*(pi). 
The value of >>(;—)? can be computed from Table 2. We would like 
to know o7(p;) as an aid in estimating the average p for better teams. 
To get an estimate of o?(p;) we will have to estimate > 0*(/,;). By a 
procedure like that used in deriving the average number of games won 
by the Series-loser (equation (2)) we can show that 


Piqi 
(7) o°(p;) = 20 [5 — 3p.q: — 4p,9.7], Qa=1— pM. 


The derivation of (7) is lengthy, but is shown in the Appendix. Natu- 
rally this variance depends on the true p;, but its value does not change 
rapidly in the neighborhood of p;=}. Therefore we propose to esti- 
mate this error variance by evaluating o?();) at the average value of 
the p,’s. For the 44 seven-game Series f is 0.583. Substituting this 
p-value in the formula for the variance (7) gives 


o?(p;) = 0.0490 (error variance). 


We use this same value of o?(;) for all years. The total sum of squares 
>> (#:—p)? for the 44 Series is 2.8674. The estimated between years 
sum of squares >_(pi—p)? is 


2.8674 — 43(0.0490) = 0.7604. 


Dividing this by 44 gives an estimate of the variance of the true p,’s 
from year to year as 0.0173, or an estimated standard deviation of 
p-values o(p;) =0.1315. The departure of the observed average, 0.583, 
from 0.500 in standard deviation units is 0.63. If we assume that the 
distribution of true p’s is normal, we estimate the percentage of times 
the American League had the better team to be 74 per cent. This result, 
is close to our previous estimate of 76 per cent. 

It has been suggested by Howard L. Jones (personal communication) 
that we might obtain an improved estimate }°0?(,)/n by averaging 
the formula of equation (7) over the normal distribution with mean 
0.583 and standard deviation 0.1315. When this was done the value 
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0.0455 was obtained instead of 0.0490. The new residual variance 
would be 2.8674—43(0.0455) =0.9109. Dividing 0.9109 by 44 gives a 
corrected estimate of the variance of the true. p,’s as 0.0207, or an 
estimated standard deviation é(p;) =0.1439. This adjusted standard 
deviation can be used as before for estimating the proportion of p,’s 
higher than 0.500. We have the departure from the mean in standard 
deviation units as (0.583 —0.500)/0.1439 or 0.58 units, which corre- 
sponds to a proportion of 72 per cent on a normal distribution. Thus 
a better approximation for the proportion of times the American 
League has had the better team using Model B is 72 per cent. Again 
the result is not far from our Model A estimate of 76 per cent. 

If we make further use of this normality assumption, we can also 
estimate the average single-game probability for the better team. We 
break the assumed normal! distribution of true p,’s for the American 
League into two parts. One part is the truncated normal for which 
pi>% (American League better), the other is the part for which 
pi<4 (National League better). Then we obtain the average p; for 
the American League when it is better, and for the National League 
when it is better, and weight each by the relative frequency it repre- 
sents. The final result is an estimate of the average single-game p for 
the better team. The integration is shown in the Appendix. When 
a(p;) is taken as 0.1315, the estimate is 0.626, but the improved esti- 
mate of o(p;) as 0.1439 gives the final estimate as 0.634, which can be 
compared with our Model A estimate of 0.655. 


TESTS OF THE BINOMIAL ASSUMPTIONS 


We have emphasized the binomial aspects of the model. The twin 
assumptions needed by a binomial model are that throughout a World 
Series a given team has a fixed chance to win each game, and that the 
chance is not influenced by the outcome of other games. It seems worth- 
while to examine these assumptions a little more carefully, because any 
fan can readily think of good reasons why they might be invalid. Of 
course, strictly speaking, all such mathematical assumptions are in- 
valid when we deal with data from the real world. The question of 
interest is the degree of invalidity and its consequences. Obvious ways 
that the assumptions might be invalid are: 

1) A team might be expected to do better “at home” than it does 
“away,” and this would negate a constunt probability because even the 
shortest Series may be played in two places. This possibility is strongly 
suggested both by intuition and by an examination of the results of 
regular season games in the major leagues. That it would hold for 
World Series games is not a foregone conclusion. 
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2) Winning a game might influence the chance of winning the next 
game, i.e., there may be serial correlation from game to game. 

To examine the first of these issues, we collected the detailed results 
of four games in each Series. We chose four because that represents the 
least number played. Games were chosen as early as possible in each 
Series to provide two games played by each team in an “at-home” 
capacity. In a seven-game Series we ideally find the first two games 
played at the National League (American League) park, the next three 
at the American League (Nationa! League) park, and the last two in 
the National League (American League) park. When we observed this 
pattern, we used the first four games played. This ideal pattern was 
not actually used as often as one might suppose. Sometimes teams 
alternated parks after each game, when extensive travelling was not 
required. Sometimes both teams used the same park as did the New 
York Giants and the New York Yankees in the early days, or the St. 
Louis Cardinals and the St. Louis Browns; in such cases, we took the 
view that the home team was the second team to come to bat. In some 
Series there were ties that had to be thrown out. And sometimes the 
first four games could not be used because three would be played at 
one park followed by some number at the other park. Our final rule 
was to collect for each Series the first two games played with the Na- 
tional League team as the “home team,” and the first two games played 
with the American League team as the home team.5 One Series (1922, 
N. Y. Giants vs. N. Y. Yankees) had to be omitted because in the four 
non-tied games, all won by the Giants, one team was “at home” three 
times. Thus we were left with 47 sets of four games. 

The plan of analysis is to compare the same team for two games 
“away” and two “at home.” We arbitrarily chose the first “away” team 
for the comparison. We counted the number won by that team in its 
first two (non-tied) away games and subtracted this from the number 
it won in its first two at-home games. This difference is taken as a 
measure of the improvement of a World Series team playing at home 
over playing away. If this difference is strongly positive or negative on 
the average, we would have to reject the notion that the chance of 
winning a single game is constant throughout the Series. For example, 
in 1949 Brooklyn (NL) was the first away team in the Series with New 
York (AL). It won one of its away games, and none of its at-home 
games, for an improvement score of minus one. The average improve- 
ment score for the 47 sets of four games was 2/47=0.042, and the 
standard error of this mean is approximately 0.14. So the improvement 





6 Hy Turkin and 8. C. Thompson, The Oficial Encyclopedia of Baseball, Jubilee Edition (New York: 
A. 8. Barnes Co., 1951). 
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score is only about a third of a standard error from the null hypothesis 
score of zero improvement. Thus far, then, we have no good evidence 
for rejecting the constant probability assumption and it has been 
shown that the probability of winning a game is not influenced very 
much by the “at-home” or “away” status of teams. 

A possible rationale for explaining this at-home-away similarity ob- 
served in Series games and not observed in season games suggests itself. 
It may be that travelling fatigues the away team and thus tends to cut 
down proficiency. During the regular season, at-home teams remain 
stationary for long periods, and various opponent teams travel in to 
play them. In the Series, one team has to travel initially, but then both 
teams do equal amounts of travelling until the Series ends. If travelling 
is an important variable influencing outcomes of games, the Series tends 
to equalize this influence much more than regular season games. 

Another possibility is that many teams are tailored to the home park 
because half the games are played at home: for example, the Boston 
Red Sox (A.L.) have a short left-field fence, and therefore hire a good 
many strong left-field hitters. But it may well be that League cham- 
pions represent teams that are not: much affected by change of park. 

Another way to say that trials are not independent is to say that 
they are correlated serially, and obviously this means p changes from 
game to game depending on outcomes of previous games. To test for 
serial correlation, we examined the results of the first four games re- 
gardless of where played. Each of the 48 sets of four games was broken 
into two sets of two games, the first set consisting of game 1 and game 
2, and the second set consisting of game 3 and game 4. If there is serial 
correlation, we might find that winning a game improved the chance 
of winning the next game. To test this we scored the American League 
team in each set of two games and thus constructed the 2X2 table 
shown in Table 7. 


TABLE 7 


PERFORMANCE OF AMERICAN LEAGUE TEAM 
IN 96 SETS OF TWO GAMES 








Second game 








Win Lose Total 
Win 32 24 56 
First game Lose 24 16 40 





Total 56 40 96 
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It will be noted that the rows of this table are almost proportional 
to one another. There is a slight question about the interpretation of 
this result. To work this out, we must explore the situation when we 
have independence between the games. Suppose that the American 
League team in any particular series of 2 games has a probability p; of 
winning each game. Then the expected values for the particular 2-game 
series would be shown by the following table: 

















Second game 
Win Lose Total 
Win pe pi(1 — ps) Ps 
First game Lose pi(1 — 7) (1—p,)? (1—p) 
Total mM (1—ps) 1 





The expected value for the total table for all the 2-game series would 
be represented by the following table in which we merely sum each entry 
of the previous table over the subscript 7: 

















Second game 
Win Lose Total 
Win Lp > pi(1 — ps) LP 
First game Lose > ps(1 — px) >. (1—ps)! (1-7) 
Total > (1—ps) n 





In the ordinary test for independence we estimate the Win-Win cell 
by multiplying the two Win margins together and dividing by the total 
rumber. In the present case we would have 


(dX pi*sn 


as this estimate. Clearly this is not identical to )>p2. But we will show 
that in our present problem it is very close. If each p; is represented as 
the sum of a grand mean plus a departure from that mean in the form 


P= pt e& 
where ¢ is the departure, then p?=p?+2pe;te7and > p?=np*+ > ed. 
Dividing both sides by n gives 


Dd p? woe 2, 0 
n n 
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and the second term on the right-hand side is approximately the vari- 
ance of the true p,’s. From our earlier work we have estimated the 
standard deviation of the p,’s to be about 0.13 or 0.14, so the variance 
of the p,’s would be estimated to be about 0.017 or 0.020. This would 
yield an expected discrepancy between > p? and ()>>p,)?/n=np? of 
about 96 X0.017, which is roughly 1.6 or 2.0 units. So in our Table 7 
the expected cell values should not be and are not very many cases 
away from the result predicted by the products of the margins. There- 
fore, approximate independence in the table of pairs of games is conso- 
nant with the notion of game-to-game independence. 

To sum up: we have made tests of the reasonableness of the assump- 
tion of the constancy of p throughout the Series and of the independ- 
ence of p from game to game, and we have found no reason to reject 
the hypothesis of binomiality, in spite of the fact that it disagrees with 
our intuition or with our knowledge of the facts of games within the 
regular season. 

We have not, of course, completely tested binomiality. We have only 
checked two of the most obvious sources of disturbance that might be 
present. One could check on various additional conjectures for which 
data are available. But the final word on the assumptions would come 
from an analysis of replications of the games, and, of course, there are 
no replications of World Series games. The issue here is that though the 
average p does not change as we go from at-home to away or from 
first to second game, it is still possible that p itself changes from game 
to game, though in no systematic way. Furthermore, the fact that the 
principal assumptions are reasonable when using a model in connection 
with World Series data may not help much if one wishes to use the 
model with other kinds of data. The methods used to investigate the 
agreement of the assumptions with the facts may have value in other 
cases, though, especially when detailed information is available about 
orders-of-test and other pertinent facts. 





ODDS QUOTED FOR SERIES 


Something a good many fans are concerned about is the before-Series 
chances of the contenders. At the suggestion of Harry V. Roberts we 
have gathered the odds quoted in advance of the Series for 36 years— 
these odds are published by betting commissioners or sometimes are 
the odds being used generally by the public for small bets. One way to 
look at these odds is to consider them a group-judgment of the subjec- 
tive probabilities associated with the two teams as they enter the Series. 
Information on betting odds was found in articles in the New 
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York Times, and for all years represents information quoted on 
the morning of the opening day of the World Series. Naturally, in- 
formation reported varies somewhat from year to year, and occasionally 
arbitrary procedures had to be adopted to get quantitative probabili- 
ties to associate with each team and to make the total probability 
unity. The betting fraternity may have been active in the early years, 
but reports for 1913, 1914, and 1915 were vague and seemed very hard 
to quantify. Therefore, these data start in 1916. We list the information 
for 1913, 1914, and 1915: 
New York Times, Oct. 7, 1913 “No odds in the betting” 
New York Times, Oct. 9,1914 “Few backing Mac’s team”... “American 
League are favorites but adherents... not 
loudly proclaiming powers” 


New York Times, Oct. 8,1915 “There isa little conservative betting ... they 
are about evenly matched.” 


Starting in 1916, however, we observed remarks giving actual odds 
like: “Boston favored 10 to 8 (N. Y. Times, Oct. 7, 1916); “Giant 
partisans . . . will wager all the cash Chicago fans want at even money 
but they will not offer odds” (N. Y. Times, Oct. 6, 1917); “ .. . oppres- 
sive silence among the fans who usually make wagers .. . no choice... 
a few Boston enthusiasts willing to wager 6 to 5 on the Red Sox but 
these are small bets” (N. Y. Times, Sept. 5, 1918). In some of the later 
years more detailed information is available giving the odds in both 
directions. 

In the computation, we have averaged the odds in cases where more 
than one set was given (like the 1918 quote above), so that if the big 
money was being wagered at even money and smaller bettors were 
willing to wager at 6 to 5, we have arbitrarily decided that the over-all 
odds prevailing were 5.5 to 5. When the odds are given in both direc- 
tions, the probabilities associated with the two teams do not add up 
to unity, because a percentage has been deducted so that the betting 
commissioners profit no matter who wins. Since we are not immedi- 
ately interested in the breadwinning activities of the betting commis- 
sioners, we have divided the remaining probability (1—sum of prob- 
abilities for the two teams) in half and added equal amounts of prob- 
ability to the estimate for each team. A sample computation will clarify 
this procedure. 

In 1931, the October 1 N. Y. Times gives one to two against the 
Athletics (A.L.) and 8 to 5 against the Cardinals (N.L.). To the fan this 
says: one dollar will get you two dollars if you bet against the Athletics 
and the Athletics lose the Series; if you choose to bet against the Cardi- 
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nals, you must wager $8 to win $5 if the Cardinals lose. These figures 
lead us to the following preliminary assessment of the total probability: 


8 13 24 37 


13 39 39 39 


_ 
3 


37/39 is 2/39 less than unity, so we arbitrarily add 1/39 to the fraction 
expressing the odds the bookmakers are giving for each team. Adding 
1/39 to 24/39 gives a total probability for the Athletics’ winning of 
25/39, or 0.64. 

Calculations like this were carried through for each year for the Series 
winner starting in 1916. Since the total probability is always unity, the 
probability for the loser is the complement of the winner’s probability. 
(1924 material may not be worth including because of a scandal on the 
eve of the Series. The best information obtainable was “betting odds 
were not decisive for either contender.” We arbitrarily decided to in- 
clude this information and assessed it as a 50-50 situation.) It turns out 
that the average subjective probability associated with the winner is 
55.33 per cent. This leads us to the opinion that the betting fraternity 
has some ability to pick the winner. Furthermore, by looking at the 
average probability for 1916-33 and comparing it with the average 
probability for 1934-51, we see that bettors are getting better at pick- 
ing the winner. The average probability for the winner from 1916-33 is 
0.5028, compared to 0.6039 for 1934-51. Of course, many will argue 
that such a powerful team as the Yankees have had on many occasions 
since 1934 makes prediction easy. Much more pertinent than average 
probabilities—at least from the bettor’s point of view—is the number 
of times a probability greater than 1/2 was associated with the actual 
winner of the Series. These data are shown below, omitting years when 
the probability was 50-50: 

Probability greater than Probability less than 
0.5 published before 0.5 published before 


Series Series 
Number of Winners 24 8 


If this 24-8 split is tested against a 16-16 split (null hypothesis), the 
result is highly significant. Thus the favorite has won 75 per cent of the 
time. Recalling that we estimate that the better team wins only 80 per 
cent of the time, this represents rather good choosing. If we let z repre- 
sent the fraction of times the better team is picked as the favorite, then 
the fraction of times the favorite would win on the average (using our 
previous 0.8 as the probability that better teams win) is 0.82+-0.2(1—z). 
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If we equate this to the observed fraction of the time that the favorite 
has won we find 


0.82 + 0.2(1 — 2) = 0.75 


0.55 
z= —— = 0.917. 
0.60 


In other words, we estimate that the better team has been made the 
favorite 92 per cent of the time. 

We have also computed the odds given before the Series for the 
American League teams, since in much of this paper attention has been 
directed to the American League. The average before-Series subjective 
probability associated with the American League team is 0.5802; for 
the same period (1916-51) the American League team has won 24 of 
36 Series, or 66.67 per cent. We note a trend toward increasing favorit- 
ism toward the American League team when the available data are 
split in half. For 1916-33 the average probability given for the Ameri- 
can League team is 0.5372 while for 1934-51 the corresponding figure 
is 0.6228. 

The reader may wonder what the boundary for the probability is at 
the upper end—we can see that the favored team does not get odds 
better than 2/3 very often, and the lower limit is 1/2. For the 36-year 
period, the average value for the larger subjective probability is 0.6044. 
This means that the favorite gets odds of about 3 to 2 on the average. 

These data on betting odds are presented for their own interest, 
rather than for any contribution they make to the problem of estimat- 
ing the probability associated with the better team. The bettors seem 
to have ability to pick the winning team, and as time goes on the bettors 
are getting more confident of their judgment. The strategy (since 1935) 
seems to be to pick the American League team unless that team is the 
Browns, ard if the Yankees are the American League team good odds 
are found as low as 1 to 2. 

Finally, we have traced the financial successes and set-backs of 
dyed-in-the-wool American League bettors and dyed-in-the-wool Na- 
tional League bettors. It is assumed that each year $100 was wagered 
on, say, the American League team, at the odds prevailing, and then 
depending on the outcome of the Series, either a profit was made or a 
loss was taken. At the end of 36 years (1916-1951), a gambler betting 
only on the American League team would have been ahead $556. At 
the end of the same 36 years, another gambler betting only on the 
National League would have been behind $808. 
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SUMMARY 


We have used two methods to estimate 1) the percentage of times 
the American League has had the better team, and 2) the average 
p-value for better teams. The first method (Model A) made the un- 
realistic assumption that the better team had the same chance of 
winning every year, and the second method (Model B) made the un- 
realistic assumption that the p-values for American League teams were 
normally distributed from year to year. Both methods have the as- 
sumption that the probability of winning single games is constant 
within a Series, and that the outcomes of games within a Series are 
independent. Two checks on the binomial assumption showed no reason 
to reject it—better, they showed good agreement with it. For the per- 
centage of times the American League has had the better team the two 
methods of estimating led to results of 76 and 72, or in round numbers 
75 per cent. For the average single-game p-value for better teams, the 
estimates are 0.655 and 0.634, or in round numbers about 0.65. The 
two methods yield fairly close agreement. 


APPENDIX 
Derivation of o7() 


To derive the variance of / we write below all the outcomes (z, y), 
the estimates corresponding to these outcomes, the probabilities of 
these estimates, and then we compute the second raw moment of #. 
Outcomes (4,0) (4,1) (4,2) (4,3) (3,4) (2,4) (1,4) (0,4) 
p 1 3/4 3/5 3/6 3/6 2/5 1/4 0 
P(p) pt = 4p'g =: 104g? 20ptg? 20p%q* §=10p*gt = Apgt Sgt 


. 9 18 s 1 ‘ 
LP P(p) = pt + > pg += pig? + Sptg? + Spragt += piat + > pat = E(p?) 


To get the variance o?(/) we subtract p’, but in the form 
p? = pt + 2p'g + 3pig? + 4p*q* + 4p*gt + pial. 
This gives 
o°(p) = pqltp® + $p°q + pg? + p’g* + $pq° + 44°]. 


Now the probabilities of the outcomes for a 5-game series have to add 
up to unity, so we write } in the form 


t = i(p? + 3p°q + Gp*g? + 6p’*g* + 3pq* + 9°). 
This relation simplifies o*($) to 





WORLD SERIES COMPETITION 


o%(f) = [5 — pq(3p* + 10p% + 10pq? + 3q°)]. 


Similarly a 3-game series has outcomes whose probabilities add to 
unity, and we write 3 in the form 


3 = 3(p* + 2p’g + 2pq? + 9’). 
This relation simplifies o7() to 


o%($) = [5 — pq(3 + 4p% + 499°)! 


2 _ PY _— —_ 272 
o*() = 3pq — 4p’q°]. 


When p=}, o7(f)=1/20. A non-truncated binomial variance with 
p=%4 would need n=5 to yield pq/n=1/20, so we might say that the 
effective number of observations for estimating p in a 7-game series is 
approximately 5. The effective number for estimating p is less than the 
average number of games, which is 5¢§ when p=}. 


Means of Truncated Normal Distributions 


If f(x) is a probability density function on the interval (— ©, ©) 
and we wish to evaluate the mean value of z given that z>a, we can 
use the expression 


af(a)de 
E(z| z >a) = ——_—— . 
f soa 


In our problem we are interested in the average p for the better team. 
We have assumed that p is normally distributed for American League 
teams. The American League team is better when p>3, the National 
League team is better when p<3. We need to compute the mean p 
for the American League when it, is better, and the mean p for the 
National League when it is better. These two means can then be 
weighted by their estimated frequency of occurrence to give a final 
weighted mean estimate of the average probability that the better 
team wins single games. 

To obtain the truncated normal estimate for the average p for better 
teams, we first set up the normal distribution with mean j and stand- 
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ard deviation é. For printing convenience, we drop the bar and circum- 
flex. When the American League has the better team, the mean p-value 
is 


1 1 
— i (2p) /o]* gr —— g—31(0.6—p)/o]* P(A.L. bett 
sat f a o “7 + pP( etter) 
P(A.L. better) P(A.L. better) 


When the National League has the better team the mean p is 


1 Ey 
== f ge-l2-—p)*/6* gy 
o V Zr J 0.5 


P(N.L. better) 





(a) 





1 
og —= e~H-0.5)/o}* +. (1 — p)P(N.L. better) 
/24r 





P(N.L. better) 


Weighting the contributions (a) and (b) by their probabilities of occur- 
rence gives 
Final estimate = 20(1/+/2x)e~il(-0-5)/e1° 4+. pP(A.L. better) 
+ (1 — p)P(N.L. better). 

With o=0.1315, p=0.583, (p—0.5)/o =0.63, P(A.L. better) =0.74, we 
get 

Estimate = 2(0.1315) (0.3271) + (0.583) (0.74) + 0.417(0.26) = 0.626, 
as reported in the text. Using o=0.1439 from our second approxima- 
tion, we have p=0.583, (p—0.5)/o =0.576, P(A.L. better) =0.718, and 


the improved approximation is 0.634, as the average single-game prob- 
ability for the better team. . 





AN ANALYSIS OF VARIANCE FOR 
PAIRED COMPARISONS* 


Henry Scuerrt 
Columbia University 


In a paired comparison test of m brands of a product each 
of the 4m(m—1) pairs is presented to 2r judges: to r in one 
order, and to r in the other. An analysis of variance is devel- 
oped for the case in which the judges’ preferences are expressed 
on a7 or 9-point scale. Account is taken of the effects of order 
of presentation. Main effects are defined for the brands. The 
hypothesis of subtractivity, analogous to the hypothesis of 
additivity in a two-way layout, states roughly that the results 
for any pair, after order effects are eliminated, can be attributed 

‘ entirely to the difference of the main effects of the two brands 
, in the pair. Significance tests for the main effects, for the order 
effects, and for the hypothesis of subtractivity are given, as 
well as estimates of various parameters and their standard 
errors. The main effects are analyzed by considering all possi- 
ble comparisons. A numerical example illustrates the method. 


1, THE EXPERIMENT 


st of analyzing paired comparison experiments is developed 


in this paper for experiments in which preferences are expressed 
on a scale of 7 or more points (explained below). The method has not 
yet been applied very extensively and it will be interesting to collect 
more experience to see how well it works in practice. The method falls 
under the general theory of least squares and linear hypotheses, and 
the tests and point estimates given are the optimum tests and estimates 
established in that theory, with the exception of the significance test 
suggested for the main effects. 

We shall expound the method in the case of consumer preference 
tests, such as taste-testing of foods or use-testing of hand lotions, al- 
though it would be applicable to other cases of paired comparisons, 
such as testing of attitudes, or response to physical or psychological 
stimuli. Suppose m brands are to be compared. We number them 
1, 2, - +--+, m. All the possible M pairs are formed, where! 


(1.1*) M = 3m(m — 1). 





* The illustrative data in this paper were originally analyzed by a different method for Consumers’ 
Union of the U. 8., Inc. The development of the present method and the exposition were sponsored by 
the Office of Naval Research. 

1 An asterisk attached to the number of an equation indicates the formula is used in the actual 
numerical analysis of the data. 
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Each pair, say 7 and j, is presented to 2r judges, to r judges in the order 
(i, 7), and to r in the order (j, 7). The order will usually mean the 
temporal order in which the brands are tried, for example, the order 
in which the brands of food are tasted. In a hand lotion test on a group 
of right-handed women, where both brands are used simultaneously, 
one on the right hand and the other on the left, the order (i, 7) could 
mean that 7 is on the left and 7 on the right, while (j, 7) means the 
reverse. 

It is assumed now that the number of judges per pair is at least 
four.2 Each judge states his preference and this is converted to a 
numerical score. We shall denote by 2;;, the preference for ¢ over j of 
the kth of the r judges presented with the pair i and 7 in the order 
(t, 7). The kth judge of the ordered pair (1, 2) will not be the same as 
the kth judge on (1, 3) or (2, 1), etc. It is assumed’ that each judge 
judges just one ordered pair, requiring 2rM judges, so that the judge 
giving the score 2; is identified not by k but by the ordered triple 
t, j, k of subscripts. 

In a 7-point scoring system the judge presented with the ordered 
pair (7, 7) makes one of the following 7 statements: 


(3) I prefer ¢ to 7 strongly. 
(2) I prefer ¢ to 7 moderately. 
(1) I prefer z to j slightly. 
(0) No preference. 
(—1) I prefer j to z slightly. 
(—2) I prefer 7 to t moderately. 
(—3) I prefer j to 7 strongly. 


The corresponding values of the scores 2;;, might be those shown in 
parentheses. In any event it is assumed that the numerical scores in- 
crease with the strength of the preference for 7 over j, and that equal 
but opposite preferences (7 over j, and j over 7) correspond to equal but 
opposite scores. We therefore do not need a new symbol for the pref- 
erence of the judges for i over j when presented in the order (J, 2), since 
it would be —2;« for the kth judge presented with the pair in this 
order. 

It wiil increase our understanding of this kind of experiment if we 





2 If there are only two judges per pair the modified method of Section 7 may be applicable. It re- 
quires more restrictive underlying assumptions. 

3 If the same judge is used on several pairs it would be desirable to balance the experiment so that 
each judge is on each brand the same number of times. 

* Actually the judges do not know the numbers i and j; all judges refer to their (i, 7) as (1, 2). 

5 For some recent work by Mosteller on the two-point scoring system see [10]. For an analysis of 
experiments in which the judges rank more than two brands see [1]. 
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raise the question, why not have each judge assign separate scores to 
the two brands of the pair presented to him and then take the differ- 
ence? One expects the present method to be more sensitive than this; 
for example, if the judge independently gave the two brands the same 
score, he might still find a slight preference for one. If we imagine him 
working on a linear scoring scale, it is as though he applied a magnifying 
glass to see how far apart the brands were. However, a magnifying 
glass may produce distortion as well as magnification; such distortion 
will invalidate the hypothesis of subtractivity discussed in Section 3. 


2, THE MATHEMATICAL MODEL® 


The underlying assumptions of our mathematical model are that all 
the z;;, are independent random variables, and that for a fixed ordered 
pair (7, j) all r variables z;;, have the same mean y;; and the same 
variance o* which does not depend on (7, 7). For some purposes we will 
want to add the normality assumption that the 2;;, are normal—which 
can only be approximately satisfied. 

The score assigned by a judge on a fixed ordered pair (7, 7) may be 
thought of as the sum of two components, (a) one characteristic of the 
judge and representing his own average taste, and (b) the other, the 
chance deviation of the judge from his own average. Component (a) is 
a random variable because the judge is sampled from a population. It 
is not assumed that for all judges in the population the component 
(a) equals u.;, but rather the meaning of 4; would be the mean of the 
component (a) in the population. That the z;;. in practice have equal 
variance is questionable, and that they are normal is certainly false. 
The writer conjectures that while the practical consequences of hetero- 
geneity of variance may be serious, those of non-normality are not (in 
general in the analysis of variance). There is some discussion of the 
first question in Section 8. 

The mean preference for brand 7 over brand j when presented in the 
order (7, j) is then y;;, and the mean preference for 7 over j in the order 
(j, t) is —py. The average of these two means yi; and —yp;; will be 
denoted by mij, 


(2.1) wig = 3(uss — His), 
and their difference by 25;;, 





* \mong the reasons the author found this model mathematically interesting is the following: He 
once thought that analysis of variance (Model I) might be defined as the special case of the general 
linear hypothesis in which the parameters can be chosen so that they all enter the regression equations 
with coefficients 0 or 1. This is not true here, as is seen most simply in equation (7.1). 
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(2.2) 833 = 3(uizj + wy). 


Thus 26,; is the difference due to order of presentation in the mean 
preference for 7 over j, while 7; is the average preference for 7 over j, 
averaged over the two orders. We note 


(2.3) 7g 7 ~~ Fy 5 i = 5i;. 


We may work either with the set of parameters u;,; or the set 7,;; and 
5,;;. The latter set is in some ways more convenient; in terms of these 
the expected value of the observation 2;;, may be written 


(2.4) E(xijx) = wig + 555, 


where we must remember that the new parameters satisfy (2.3). The 
expected value of the preference for i over j in the order (J, 1) is then 


E(—ji) =Ttij- 853. 

If there is any interest attached to the order effects 5;; beyond trying 
to eliminate their effects from the experiment, it is worthwhile intro- 
ducing the average order effect 
(2.5) 5 = >)’ 6;/M, 


i<j 
where the primed summation sign denotes the sum over all ¢ and j with 
i<j. The parameter 25 then measures the average advantage to a 


brand ¢ of being in the order (i, j) rather than (j, 7), averaged over all 
2M ordered pairs. 


3. THE HYPOTHESIS OF SUBTRACTIVITY 


The hypothesis of subtractivity (which we shall see can be statistically 
tested) is that there exist parameters a1, a2, - - - , om characterizing the 
m brands, such that the average preference 2;; for 7 over j is the differ- 
ence of the corresponding parameters, 


(3.1) Wig = 1 — Oy. 


Since only the differences of the parameters matter, we may without 
loss of generality add the convenient assumption that their sum is zero, 


(3.2) a cot 
t=] 


If the hypothesis of subtractivity is accepted then a simple way of 
rating the m brands relatively is by estimating the a; What can be 
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done when the hypothesis of subtractivity is rejected we shall discuss 
below. To illustrate the notion, consider the estimated average pref- 
erences #:; Which were obtained in an experiment with m=4, r=12, 
shown in the second column of Table 3.1: 


TABLE 3.1 


COMPARISON OF OBSERVED AVERAGE PREFERENCES 7%; 
WITH VALUES a@;—@; ESTIMATED UNDER 
HYPOTHESIS OF SUBTRACTIVITY 








(i, 7) Fei 





(1, 2) 1.71 
(1, 3) -50 
(1, 4) 1.67 
(2, 3) —1.62 
(2, 4) 67 
(3, 4) 1.46 











The complete data for this experiment will be given in Section 6. The 
estimates 7;; of the average preferences x;; under our underlying as- 
sumptions are calculated from formulas (4.1*) and (4.2*) below; under 
the hypothesis of subtractivity the estimates 4; of the parameters a; 
are given by 


(3.3) A; = Do Riy/m, 
j=l 


where #;; is defined to be zero; these and all other estimates in this 
paper except é are least squares estimates. The numerical values of 
the @; are listed in (6.1) below. Do the six differences calculated from 
these four @; sufficiently well reproduce the actually observed averages 
#i,? More precisely, should we accept the hypothesis of subtractivity? 
We shall answer the question for this specific case in Section 6. Let us 
now return to generalities: 

The estimation formula (3.3), even though derived under the hypoth- 
esis of subtractivity, suggests that in the general case (where subtrac- 
tivity is not assumed) we might define parameters a; from the formula 


m 


(3.4) a; = >) xi;/m, 


j=l 


where 7;; is defined to be zero. In the general case these parameters a; 
would then be estimated by (3.3). The a; may be regarded as somewhat 
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analogous to the main effects in the analysis of variance of a two-way 
layout, and we shall henceforth call them the main effects. Without any 
loss of generality we may then write in the general case 

Tig = 4 — a5 + Vis, 


where the quantities y,; thus defined are easily seen to satisfy the con- 
ditions 


eal 
DL v5 = 0 (i = 1,2,--+, m). 
j=l 
The yi; are given explicitly in terms of the #;; by the formula 
(3.5) vis = ig — Do rin/m+ DO a5n/m. 
h=1 h=l1 


They may be called the deviations from subtractivity, and are analogous 
to the interactions in a two-way layout.’ It is easily shown that if the 
deviations ;, are defined from (3.5) then the hypothesis of subtractiv- 
ity is true (that is, there exists constants a; such that the 7,; satisfy 
(3.1)) if and only if all the y,; are zero. The usefulness and interpreta- 
tion of the estimates @; is not so clear if the hypothesis of subtractivity 
is violated, analogously to that of the estimates of the main effects in a 
two-way layout when the hypothesis of additivity is violated. 

It may be helpful at this point to bring together the different effects 
that have been introduced. If we write e;; for the “error” in 2;;, that 
is, Cijk = Lijk — Bij, then 


(3.6) lik = (a; — aj) + twa + & + (5:5; — 4) + Ciz. 
L t L j L 

















—— anil — hicuill a 
Se Sa S, 2rM 8? S;’ S, 
L J l j 
S, Ss 
i j 
Sz 


The symbols below the horizontal brackets in (3.6) refer to certain 
sums of squares associated with the effects, to be defined in the next 
section. 





7 The analogy in the formulas could be made complete by writing Yej ™% ij — 7%. Fy +-,, and defin- 
ing the symbols with the dots in the usual way. 





952 


ay 
hy 
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4, FORMULAS 
Under our underlying assumptions (the hypothesis of subtractivity 
not assumed) the least squares estimates of the parameters (other than 
o?) we have introduced are? 


(4.1%) Bij = Do rin/r (i ¥ j), du = 0, 
k=l 

(4.2*) Tz = 4(Biy — By), 

(4.3) 85 = 4s + Bis), 

(4.4*) & = >> %;/m (where #4; = 0), 
j=l 

(4.5) Vig = Fig — A + Qj. 


The usual unbiased estimate of o is 
(4.6*) 6? = S,/[2M(r — 1)], 


where the crror sum of squares S, is defined by (4.12) but calculated 
from (4.16*), and M is defined by (1.1*). 
The analysis of variance that will ordinarily be made involves the 


following sums of squares of the above estimates: 


(4.7*) S,=r do Dd ia’, 
tl jal 
(4.8*) S. = 2r Do’ Fy, 
i<j 


where the primed summation sign denotes the sum over all 7 and j with 
t<j. 


(4.9) Si = 2r 0’ 33, 
5< 
(4.10*) Sa = 2rm>_ &?, 
t=] 
(4.11) S, = 2r Do’ Fi; 
i<j 


the error sum of squares, 


™ m r 


(4.12) S.= > > Dd (rise — Biss)’, 


fl fool bel 
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where the symbol 2; is defined to be zero; and the total sum of squares 


(4.13*) S; = > > > Liga? (ix = 0). 
tml jul kal 
These sums of squares satisfy the following identities: 
(4.14*) S; = S, — Sz, 
(4.15*) S, = Sz — Sa, 
(4.16*) S. = S: — &,. 


Ordinarily only the starred formulas will be needed in the numerical 
computations. In order to have all the needed formulas collected in one 
place we list here also the following four which will be used less fre- 
quently: 


(4.17) f = ph, 3u/M, 

i<j 
(4.18*) 6 = >) Dd s/(2M), 

t=—1 jm] 
(4.19) Ss! = 2rd’ 6; — 93, 

i<j 
(4.20*) S,’ = Ss — 2rM8?. 
The variances of some of the above estimates are 

(4.21) Var (#3) = g?/ (2r), 
(4.22) Var (&;) = o?(m — 1)/(2rm?’), 
(4.23*) Var (@; — &;) = o?/(rm) (t ¥ 9), 
(4.24) Var (8) = o/(2rM). 


From formula (4.23*) it is possible to derive the following “yard- 
stick” Y, for making all the comparisons among the main effects; its 
use will be explained in Section 6. Choose a confidence coefficient 1 — e; 
the usual practice would be to take e=.05. From tables [3, 13, 12] of 
the Studentized range q find qi_,, the upper ¢ point of q, entering the 
tables for a range of m variates and for »=2M(r—1) degrees of free- 
dom.* Then 





8 The precise definition of the random variable g is g=w/s, where w is the range of a saraple of m 
independent standard normal deviates and »s* is an independent chi-square variable with » degrees of 
freedom. 
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(4.25*) Y.= g.-V 6 7 (2rm), 
where 6 is given by (4.6*). 


5. DISTRIBUTION THEORY 


Under the underlying assumptions of Section 2 the estimates of 
bij, ij, ij, Oy Yes Given above by (4.1*) through (4.5) have the usual 
desirable properties guaranteed by the theory of least squares, includ- 
ing unbiasedness. Their variances and covariances can be computed in 
a straightforward way since they are linear combinations of the fi; 
(with 77) which are independent with equal variance o?/r. 

If we add the normality assumption of Section 2, we obtain the fol- 
lowing further relations: The above set of estimates is statistically in- 
dependent of 6?, and the set has a joint multivariate normal distribu- 
tion. The four sums of squares S,, S,, S;, S, are statistically inde- 
pendent. Under our underlying assumptions S,/o* has the chi-square 
distribution. If further the hypothesis of subtractivity is satisfied (all 
7i3=0), then S,/o? is also chi-square; if the main effects a; are all zero, 
then S,/o? is chi-square; if the order effects 6;; are all zero, then S;/c? 
is chi-square; the respective degrees of freedom are shown in Table 5.1. 
The table is arranged so that the sum of squares and number of de- 
grees of freedom below each horizontal line (except the top three) is the 
sum of the two entries immediately above. 




















TABLE 5.1 
SUBDIVISION OF THE TOTAL SUM OF SQUARES 
Source Sum of Squares Degrees of Freedom 

Main effects Sa m—1 
Deviations from subtractivity Sy M-m+1 
Average preferences Sr M 
Order effects Ss M 
Means Su 2M 
Error S. 2M(r—1) 
Total S; 2rM 











The sum of squares S; may be further subdivided as shown in Table 
5.2 into the sum of squares S,’ and the “sum of squares” 2rM§?. The 
five sums of squares S., S,, S,’, 2rM§?, S, are statistically independent. 
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If all the 6,; are equal, then S,'/o? is chi-square; if the average order 
effect 5=0, then 2r/6§*/o? is chi-square; the respective degrees of free- 
dom are M—1 and 1. 


TABLE 5.2 


FURTHER SUBDIVISION OF THE SUM OF SQUARES 
FOR ORDER EFFECTS 














So Sum of Degrees of 
ae Squares Freedom 
Average order effect 2r M3, 1 
Differences among order effects S3’ M-1 
Order effects Ss M 











6. ANALYSIS OF THE DATA 


A convenient way of tabulating the data is shown in Table 6.1. 
These data were obtained in a taste-testing experiment with four 
brands (m=4) of a food product. For each of the six pairs (M/ =6) of 
brands, twenty-four judges (2r =24) tasted the pair, 12 in each order 
of serving. The scoring was on the 7-point system explained in Section 
1. (The results indicate that a 9-point system would have been better; 
see Section &.) The table shows, for example, that of the 12 judges who 
tasted brands 1 and 2 together in the order (1, 2), the number who ex- 
pressed a strong preference for 1 over 2 was 4, while of the 12 judges 
who tasted 1 and 2 in the order (2, 1) the number who expressed a 
strong preference for 1 over 2 was 6; or, in terms of our previous nota- 
tion, the number of x12. equal to 3 was 4, while the number of 2214. equal 
to —3 was 6. The entry in the column headed “total score” is > x rijx; 
this divided by r=12 gives i; in the next column. Finally, #;; is calcu- 
lated from (4.2*), thus #12=}(1.583+ 1.833) = 1.708, while #2: is not 
recorded since it equals —#12. The &; are now calculated from (4.4*), 
for example, 


Qs = 3 (4 + Fo + 0+ Fu) = 1(—Fis — 723 + Fu), 
the last three terms being listed in Table 6.1. The numerical values are 
(6.1) &, = .969, & = — .667, & = 646, & = — .948. 


A partial check on the correctness of the &; is that their sum is zero. We 
are now ready to start constructing a table like Table 5.1 with an extra 
column for mean squares. We calculate S., S,, S, directly from (4.10%), 
(4.8*), (4.7*), and enter them in the table. The total sum of squares 
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TABLE 6.1 


DATA FROM A TASTE-TESTING EXPERIMENT WITH 


m=4, 2r =24 








Frequency of Scores 






























































Zi,z Equal to Total 
(i, j) Score ri 4 
—-3|—2/-1/ o| 1/218 
(1, 2) 3/ 3] 2| 4 19 1.583 1.708 
(2, 1) ei si 4 1/1 —22 —1.833 
(1, 3) 3/ 2] 2] 4] 1 10 .833 .500 
(3, 1) 4| 1| 2| 4 1 = % - 
(1, 4) 2| 5! 5 27 2.250 1.667 
(4, 1) 2\ 4] 3] 1 2 ot —1.083 
(2, 3) 3| 4] 3 1/1 a —1.417 | —1.625 
(3, 2) ai al ai 8 4 22 1.833 
(2, 4) 1| 2 1| 4] 4 14 1.167 .667 
; (4, 2) 2| 2| 2 2| 4 — = 26 
q 
(3, 4) 4) 1] 7 27 2.250 1.458 
(4, 3) 5} 1] 1 1} 3] 1 | — .667 
) 
Totals | 19 | 20] 15| 9] 22] 32 | 27 
, 
TABLE 6.2 


ANALYSIS OF VARIANCE OF THE DATA OF TABLE 6.1 








Sum of Squares 














| Source df. oe 

| Symbol Value _— 
Main effects Sa 259 .2 3 86.4 
Deviations from subtractivity Sy 8.6 3 2.9 
Average preferences Sy 267 .8 6 
Order effects S3 33.3 6 5.55 
Means Su 301.1 12 
Error S. 357.9 132 2.711 





Total 














659 .0 
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can be calculated quickly from the column totals at the bottom of Table 
6.1 thus 


S,; = 37(19 + 27) + 27(20 + 32) + 1°(15 + 22) = 659. 


This is entered into the table and the remaining sums of squares S,, S;, 
S,. are then filled in by subtracting the entry above from that below. 
The mean squares for Sz, S,, S;, S, are entered in the table by dividing 
these four quantities by their respective numbers of degrees of free- 
dom. The resulting table is Table 6.2. From the mean square for error 
we can now calculate the “yardstick” Y, from (4.25*). In our numerical 
example, if we select a confidence coefficient of 1—«=95 per cent, we 
enter the tables (3, 13, 12] of the Studentized range for a range of m=4 
variates and for vy = 132 degrees of freedom to find q¢.9,=3.69, and thus 


(6.2) Y.o5 = 3.69+/2.711/(24 X 4) = .620. 


We shall first give the general procedure for reaching the statistical 
conclusions, and then apply it to the numerical example. The conven- 
tional test for the significance of the main effects (that is, test of the 
null hypothesis that all the main effects are zero) would be made by 
noting whether the mean square for the main effects is significantly 
large: the significance of this and the other mean squares is judged by 
taking the ratio to the error mean square and referring it to the F-ta- 
bles; in particular, if any mean square is less than the error mean 
square, it is not significant. However, it is suggested that in the usual 
case where one is interested in making all possible comparisons among 
the brands the significance of the main effects be tested not by the con- 
ventional F-test but by the test to be described below. The reason is 
that we recommend a new method due to Tukey [15] for making all 
the comparisons in analysis of variance, and if we also make the con- 
ventional test, we might possibly reach inconsistent conclusions about 
the main effects. The suggested test of the main effects is to declare 
them significant at the « level if and only if the largest and smallest of 
the estimated main effects &; differ by more than the “yardstick” Y,. 
If we find non-significance in testing the main effects, we must conclude 
that the experiment has demonstrated no over-all difference in prefer- 
ence among the brands. If the main effects are significant, we will be 
interested next in the mean square for deviations from subtractivity. 
If this is not significant, we accept the hypothesis of subtractivity.® If 








* Of course, after we have accumulated some experience with this method, our attitude towards 
the hypothesis of subtractivity will depend also on the outcome of similar experiments in the past. If on 
the basis of long experience we should be willing to accept the hypothesis of subtractivity before analys- 
ing the experiment at hand, then we may pool the sum of squares Sy and the number of degrees of free- 
dom for deviations from subtractivity with those for error. A similar remark applies to the sum of 
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we accept the hypothesis of subtractivity, the problem is reduced to 
making inferences about the parameters a;; if we reject this hypothesis 
and with it the possibility of a simple linear scaling of the brands, we 
may still be interested in inferences about the parameters a;, but we 
must then regard them as part of a more complicated picture—we shall 
return to this question at the end of this section. 

The comparisons of the main effects a; may be made as follows: The 
experiment will be said to have demonstrated a difference for any two 
main effects a; and a; if and only if their estimates &; and 4; differ by 
at least the “yardstick” Y,. The meaning of these inferences can be 
seen by considering the following quantitative statements about the 
differences: With confidence 1—« we may make all M statements about 
the differences a;— aj, 


(6.3) &—8&—-Y.,.Sa-—a;54&-—8;+ Ye; 


that is, the probability is 1—« that all M statements (6.3) are true un- 
der the assumptions (including normality) of our mathematical model. '° 

If it is desired to analyze the order effects, we calculate § from 
(4.18*) and subdivide the sum of squares S, (even if its mean square is 
not significant) according to Table 5.2, calculating S;’ by subtraction, 
and adding a mean square column to the table. If S,’ is significant, we 
conclude that the order effects 6;; are not all equal; if 2rM@ is signifi- 
cant, we conclude the average order effect 6 is different from zero. A 
confidence interval for 6 can be set up from (4.24), (4.6*), and the ¢-dis- 
tribution with 2M/(r—1) degrees of freedom. 

We now illustrate the procedure of the last three paragraphs by ap- 
plying it to the data of Table 6.1. The extreme 4; in (6.1) differ by more 
than the “yardstick” Y o5=.620 calculated in (6.2). This means the 
main effects are significant at the .05 level, and we conclude that there 
exist differences in the taste-desirability of the brands. In Table 6.2 
the mean square for deviations from subtractivity is not very different 





squares S! discussed in the next paragraph. If we invariably pool both Sy and Ss’ with S, and use thesum 
as the error sum of squares, then we have the simplified model of Section 7. 

10 An exciting consequence of his method pointed out by Tukey is that it is possible to make valid 
tests of hypotheses suggested by the data! For example, we might note that the brands with the two 
highest a; are the only ones to contain ingredient z and the brands with the three lowest ~; are the only 
ones to contain y, and this suggests comparing the average of the two highest with the average of the 
three lowest. More generally, if). te Cia is a contrast, that is, the c; are known constants with) 7%) ce; =0, 
then the probability is 1 —e that the (infinite) totality of the contrasts) fa ejay satisfy the inequalities 


™m™ m ™m ™ m 
Dd cies —F¥eD, les] SD. ccs SD. cigs t4¥ed_ lel. 
i=] t=1 t=1 t=] t=] 


If contrasts other than differences a; =a; are of interest, shorter confidence intervals may be obtained 
by a different method, which would then also have to be used on the differences, where it gives longer 
intervals. The method is stated in an abstract by the writer in the September 1952 issue of the Annals 
of Mathematical Statistics. 
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from the error mean square, and so we accept the hypothesis of sub- 
tractivity and decide that the brands are adequately rated by their 
(unknown) main effects a;. Still using our “yardstick” Y .o5 on the 4; in 
(6.1) we conclude that no difference has been demonstrated between 
a, and a3, nor between az and ay, but that a; and a; are greater than 
a and ay. On the basis of this analysis of the experiment we must thus 
accept the decision that the four brands fall into two sets of two such 
that the brands in one set are preferred over those in the other, but 
that there is no demonstrated difference within either set. If quantita- 
tive statements are desired about the possible differences, we apply 
(6.3) with the result that we assert with 95 per cent confidence that alli 
six of the following inequalities are true: 


1.02 Sa,—a,S 2.26 
—-30Sa—a3;S_ .94 
130 Sa,—~%S 2.54 
—1.93 S az — az S —.69 
—.34S5a,—-aS 90 
97 Sa3—mS 2.21 


Some further remarks about the treatment and interpretation of this 
kind of data may be found in Section 8. 

This would ordinarily be as far as we would be interested in carrying 
the analysis. However, we shall include here the analysis of the order 
effects, partly for the purpose of illustration, but also because in Section 
7 we shall consider a simpler mathematical model based on additional 
underlying assumptions about the order effects, which assumptions are 
subject to test in the present more complicated model. From (4.18*) 
we calculate 


(6.4) 3 = 382. 


In interpreting this we should remember that the average advantage 
due to order is 26 as explained at the end of Section 2, and the estimate 
of this is 25=.764. We can now complete Table 6.3, and we find the 
interesting result that although the mean square for order effects in 
Table 6.2 just misses significance at the 5 per cent level, nevertheless 
when we subdivide it in Table 6.3 the component associated with the 
average order effect 6 is highly significant, while the component asso- 
ciated with the differences of the order effects 5;; among themselves is 
not significant (F <1). In numerical terms, the small value of the latter 
component with its five degrees of freedom masks the significance of 
the former with its one degree of freedom. 
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It remains to consider the problem facing us in experiments where 
the main effects and the devietions from subtractivity are both signifi- 
cant. In this case we decide the brands differ in desirability but reject 
the hypothesis of subtractivity. To simplify the problem suppose now 
we wish to compare brands 1 and 2. The average preference for 1 over 2 
is expressed by 712 of which an unbiased estimate is 712. However, if we 
assume subtractivity, then &:— 4:2 is also an unbiased estimate of 712 
and has smaller variance, as may be seen from (4.21) and (4.23*). If 


TABLE 6.3 


FURTHER BREAKDOWN OF THE SUM OF SQUARES 
FOR ORDER EFFECTS 

















Sum of Squares - — 
Symbol Value Square 

Average order effect 2rMés? 21.0 1 21.0 
Differences among order effects S;' 12.3 5 2.46 
Order effects S3 33.3 6 5.55 

















subtractivity is not assumed, @:—@: is no longer an unbiased estimate 
of 12, although it is of course still the best unbiased estimate of a1—az. 
What does a:—a2 measure? It measures the relative superiority of 1 
over 2 in an average sense when 1 ard 2 are compared with the m—2 
other brands as well as with each other. Whether #12 or @:— 2 is ap- 
propriate then depends on whether we are interested in the preference 
in the direct comparison of 1 and 2 or the comparison of preferences for 
1 and 2 relative to all the brands tested. The design permits the latter 
comparison to be made with much greater accuracy. If we are always 
happy about estimating a;— az instead of 712, then the procedure is the 
same whether the hypothesis of subtractivity is accepted or rejected 
and one’s interest in the hypothesis becomes more academic. 

The writer is indebted to Consumers’ Union of the U. S., Inc. and 
to its technical director Mr. Morris Kaplan, for permission to use the 
data of Table 6.1. 


7. ANOTHER MATHEMATICAL MODEL 


If we have only two judges for each pair (2r=2) because the com- 
parisons are difficult or expensive to make, the preceding model can 
no longer be used: the number of degrees of freedom for error would 
be zero. In order to analyze the data it is then necessary to make stricter 
underlying assumptions, or in other words, employ a model with fewer 
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unknown parameters. The situation is analogous to the analysis of 
variance of a two-way layout with one observation per cell, in which 
case we have to assume the interactions to be zero. In the present case 
we might make some of the following assumptions: (i) the deviations 
i; from subtractivity are all zero, (ii) the order effects 6,; all have a 
common value 4, (iii) the 6;; are all zero (often the most dangerous as- 
sumption). 

We shall indicate here the method for the model based on the addi- 
tion of (i) and (ii) to the underlying assumptions of Section 2. We shall 
not explicitly assume r=1 so that the model may be of use if the num- 
ber 2M(r—1) of degrees of freedom for error in the more complicated 
model is small for whatever reason. 

The expected value of z;;. is now of the form 


(7.1) E(x: jx) = apm aj “ft é, 


where the meaning of 2; is the same as in section 1. The estimates of 
a, and 6, their variances, and the variance of &;— 4; are all given by the 
same formulas as in Section 4. The only difference is that the estimate 
of o? is now formed from a different error sum of squares S,’ with a dif- 
ferent number of degrees of freedom, namely 


s 


ORITD FPF EAT I OF Pe Fe bere: 


(7.2*) (é’)? = S,’/(2rM — m), 
: where S,’ is defined from 
(7.3) Ss, =>. > pF (zijn — A: + 0; — 6)2, 
tml jal kel 


and calculated from the identity 
(7.4*) S.’ = S, — S. — 2rM8*, 


the three terms on the right being calculated exactly as before. Table 
7.1 shows the subdivision of the total sum of squares S;,. 

The distribution theory for the estimates used here is the same as 
in Section 5. If we add to the new set of underlying assumptions the 
normality assumption of Section 2, we find that S,’/o? has the chi- 
square distribution. The sums of squares Sa, 2rM®, S,’ are statistically 
independent. If the main effects a; are all zero, S./o? is chi-square; if 
the order effect 5 is zero, 2rM &/o? is chi-square; the degrees of freedom 
are shown in Table 7.1. 

The yardstick for the comparison of main effects in this case is cal- 
culated as 


(7.5*) 


Sreserei siaervive 





Y.! = q--'V(0’)*/ (2rm), 
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TABLE 7.1 
ANALYSIS OF VARIANCE FOR SIMPLIFIED MODEL 














s Sum of Degrees of 
_— Squares Freedom 
Main effects Sa m—1 
Order effect 2rM 3s? 1 
Error S,’ 2rM —m 
Total Ss 2rM 











where q:_.’ is found from the Studentized range tables as explained in 
connection with (4.25*), except that »=2rM —”m. Inferences about the 
main effects may be made as explained in section 6, except that Y, is 
replaced by Y,’. If 6 is of interest, it may also be estimated as in Section 
6. The only change is that in place of the former estimate of o? we use 
(7.2*) with 2rM —m degrees of freedom. 


8. SOME PRACTICAL REMARKS 


The scope of the present paper does not include non-statistical but 
vital psychological and physiological aspects of paired comparison ex- 
periments. For taste-testing experiments a considerable laboratory 


some bibliography on psychological scaling ».. [7] and [6]. 

In reporting conclusions from paired «7mparison experiments on 
brands it is important not to imply without further evidence that a less 
preferred brand is of poor quality. In the experiment from which the 
data of Table 6.1 were obtained the judges were also asked to rate the 
flavor of each brand as good, fair, or poor. It may be worth getting 
such extra information in most of these experiments. In the present 
case, in spite of the preference demonstrated for brands 1 and 3 over 
2 and 4, the latter brands were not considered poor by the majority of 
judges tasting them: 28 per cent rated them poor, 44 per cent fair, 28 
per cent good; the corresponding figures for brands 1 and 3 were 2 per 
cent, 22 per cent, 76 per cent. 

A casual inspection of Table 6.1 indicates that too many scores are 
jammed up against the ends of the scale at +3. This could be remedied 
somewhat by permitting scores +4 under a 9-point scale, as described 
in Section 1. There also appears to be a scarcity of zeros: the probability 
of a judgment of a slight preference one way or the other seems in 
general to be greater than that of a judgment of no preference. This 
could be reflected in the scoring system by making the score difference 
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between “no preference” and “slight preference” smaller than the other 
differences; however, the whole question of an optimum scoring system 
has been left untouched in this paper. 

The assumption of homogeneity of variance—that the variance of 
the scores is the same for all ordered pairs (7, 7)—can be tested sta- 
tistically by calculating the sample variance for each pair and using the 
L, test" [5, 9, 11. Note small values are significant]. This test was ap- 
plied to the data of Table 6.1 and a significant result at the 5 per cent 
level was obtained, indicating non-homogeneity. The non-homogeneity 
is caused partly by the jamming effect mentioned in the last paragraph; 
the use of the 9-point scale suggested there would somewhat alleviate 
this as well as the biasing effect in the estimates caused by the jamming. 
The non-homogeneity due to jamming tends to make us underestimate 
the error and hence exaggerate the significance of the observed differ- 
ences of the brands. If some pairs obviously suffer from extreme jam- 
ming (such is not obvious for our data), it would be sensible to consider 
dropping the obviously low or high brands from the analysis made by 
the present method, and afterward to analyze their relation to the 
brands included in the analysis. 

The data of Table 6.1 are so unequivocal” that the same results, 
namely that the brands fall into two groups of two, one better than the 
other, are obtained by the less refined form of analysis known as the 
sign test, made on each pair: by this we mean simply that for each pair 
of brands we ask whether the proportion of preferences is significantly 
different from 4, when only the direction” and not the strength of the 
preference is considered. 

If we consider only the analysis of a single pair, the relative efficiency 
of the sign test would be roughly the well known [4] value 2/2 = 64 per 
cent for large r: We say “roughly” because, first, it is not entirely clear 
how order effects should be treated in the sign test, and secondly, the 
value 2/7 is based on giving the present method the benefit of the 
normality assumption of Section 2. A disadvantage of the sign test 
relative to our method is that it does not permit the use of the data 
from all the pairs to be used in the comparison of a single pair of brands, 
but only the data for that particular pair; the necessary generalization 
of the sign test would be Thurstone’s scaling problem [10], in which 
order effects are, however, not ordinarily considered. 





1! Hartley’s maximum F-ratio test [8] is much quicker. His table of the 5 per cent point unfortunately 
goes only up to2M =12(m =4). For 2M =20(m =5), the 5percent point may be closely approximated by 
exp {5.01 ¥2/(r—2)}. 

12 Actually the number of judges per pair was chosen to give the desired sensitivity under the 
cruder analysis mentioned, because the new method of analysis was not fully developed at the time. 

18 Judges in this experiment were asked, “If your answer was ‘no preference’ but you were forced to 
make a choice, which would it be?” 
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The question of how many observations should be taken can be ap- 
proached" in terms of the desired accuracy of the estimates by con- 
sidering the variance of the estimates or the length of the “yardsticks.” 
For this approach a preliminary guess of o? is necessary; this would 
have to be made from the values of the error mean square of previous 
experiments more or less similar to the contemplated experiment. The 
formulas needed may be found in Section 4. In general the sensitivity 
of the method increases with the number m of brands as well as with 
the number 2r of judges per pair. The choice of m will be determined 
by non-statistical reasons to a greater degree than the choice of 2r. 

In concluding these remarks we remind the reader that no refine- 
ments of statistical analysis can erase the blunder of selecting the 
judges in such a way that they cannot reasonably be regarded, at least 
as concerns their preferences for the objects tested, as a random sam- 
ple from the population about which we desire to draw the conclusions. 

The writer is indebted to the referees for some very helpful criti- 
cisms.'® 





4 If we approach the question in terms of the power of the tests, tables and charts [14] are available 
for the F-tests, but unfortunately not yet for the test based on the extreme a; and Tukey’s “yardstick” 
(4.25*). ‘ 

48 While the writer mainly disagrees with the following comments from a referee he feels the reader 

should have the advantage of considering them and deciding for himself. 

“As I see it the various procedures which have been called scaling procedures fall into two quite 
different classes: 

(A) The subject assigns scale numbers to the objects judged. These numbers are assumed to represent 
the scale. However, various manipulative procedures have been designed in order to check on 
whether subjects agree with one another or not, agree with themselves or not, are utilizing the scale 
as if it were linear, etc. Basically, however, the subject must have the actual scale including the 
numbers belonging to it in his head before the procedure can start. 

(B) The subject makes a series of qualitative discriminations, from which the scale numbers must be 
derived by various assumptions. 

“My own personal feeling is that the procedures of type B are considerably the superior type of 
procedure for psychological scaling. I think it can be demonstrated that the number of arbitrary assump- 
tions and the amount of instruction and indoctrination required for the subject are considerably less 
with procedures of type B than they are with procedures of type A. However, regardless of which type 
of procedure is superior, it seems to me that all writers are obligated to keep this distinction in mind 
and to make clear in their writing which type of procedure is being discussed at any moment. The failure 
of the writer to clarify this point caused considerable difficulty and discussion in reading the paper. This 
manifests itself in a number of places: 

(a) The title uses the term ‘paired comparisons.’ Probably ‘rating scale’ would be better, since paired 
comparisons both as developed by Thurstone and as considered by Guttmann, Kendall, and Mostel- 

ler deal with the procedures that seem to me to belong to type B. Hence to utilize the same term 
for a radically different procedure is unnecessarily confusing to the reader. 

(b) Footnote 6 referring to Mosteller's work seems to me to be definitely misleading since it implies 
that his procedure is similar to the present one. Also the statement in which the footnote is intro- 

duced, makes it rily easy for the reader to infer that the procedure presented by Scheffé 
Tis essentially the same as that which has been called paired comparisons by previous writers. 

(cs) For what I think is the same reason, the reference to Thurstone’s paired comparisons and law of 
comparative judgment procedure toward the end of section 8 is extremely misleading. Thurstone’s 
procedure is very radically different from the one presented in this article in that it is a type B pro- 
cedure, not a type A procedure. To characterize it as simply being a generalization of the sign test 
which ignores order effects is, in my opinion, either unnecessarily confusing or else erroneous. 

(d) I feel in general that it should be made clear at the beginning and borne in mind throughout the pa- 
per that this is a procedure of type A and wherever type B procedures are mentioned it should be 
made clear that there is very little in common between the two procedures.” 
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SOME NONPARAMETRIC TESTS FOR STUDENT'S 
HYPOTHESIS IN EXPERIMENTAL DESIGNS 


Joun E. WaLsH 
U. S. Naval Ordnance Test Station* 


In experimental designs the quantities investigated are often 
grouped into blocks as a method of obtaining a higher precision 
for the experiment. This grouping may result in high correla- 
tion among observations within the same block. If the posi- 
tions of the treatments within a block are assigned at random, 
the amount of correlation is difficult to determine and may 
vary greatly between treatments and from block to block. Also 
for many experiments the variances of the observations may 
differ substantially between treatments and from block to 
block. When one or hoth of these complications arise, the ¢- 
statistic is not necessarily applicable for comparing the effects 
of the treatments under investigation. This paper presents 
some nonparametric tests of Student’s hypothesis which are 
usually valid for a well-known type of experimental design if 
there is statistical independence among blocks (number of 
blocks 24). These nonparametric results are reasonably effi- 
cient (compared to those based on the {-etatistic) for the case 
where the totality of observations are independent, normally 
distributed, and have the same variance. High precision can 
sometimes be obtained by designing the experiment to yield 
large positive correlation within blocks and then using the non- 
parametric results. 


INTRODUCTION 


HE investigations of this paper are limited to experimental designs 

which are laid out in separate blocks. Within a block the position of 
the treatments is determined by an independent randomization proc- 
ess. The usual mathematical mode! for this type of design is adopted 
for all analyses (including the nonparametric ones). A detailed descrip- 
tion of the mathematical model and the type of design considered is 
presented in the section of this paper titled Mathematical Model. 

On the basis of the mathematical model and some additional assump- 
tions, t-statistics can be constructed for comparing treatment effects. 
The usual additional assumptions made are 

(i) The totality of observations are statistically independent. 

(ii) They all have the same variance. 

(iii) They are all normally distributed. 


* Part of the results contained in this paper were obtained while the author was with The Rand 
Corporation. 
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One of the principal purposes of this paper is to examine (i)-(iii) from 
the viewpoint of validity in practical applications. This examination 
is presented in the Validity of Assumptions section. It is found that in 
many cases (i)—(iii) seem to be of doubtful validity even in the sense 
of being sufficiently close epproximations. 

The nonparametric results are valid under conditions much less re- 
strictive than (i)—(iii). In fact, sufficient conditions for the validity of 
the nonparametric tests and confidence intervals are 

(I) There is statistical independence among blocks. 

(II) Within the same block, linear combinations of treatment ef- 
fects (i.e., the observations yielded by the treatments) have 
probability distributions (cumulative) which are continuous 
and symmetrical. 

If care is exercised in setting up the design, it appears that situations 
where these assumptions hold to a reasonable approximation can usu- 
ally be obtained in practice. A discussion of the generality of the situa- 
tions covered by (I)-(II) is given in the Validity of Assumptions sec- 
tion. 

Although they are valid under very general conditions, the non- 
parametric results would be of little value if they had exceedingly low 
efficiencies. To obtain an approximate lower bound for the efficiency 
of the nonparametric results, it was assumed that conditions (i)-(iii) 
were satisfied and in addition that the common population variance 
was known. Then the power efficiency (see references [1] and [2]) of the 
nonparametric tests was computed with respect to the corresponding 
most powerful tests. This procedure is equivalent to assuming (i)-(iii) 
and comparing the nonparametric tests with the corresponding (-tests 
based on an infinite number of degrees of freedom. The lowest efficien- 
cies obtained are in the neighborhood of 50 per cent. Interpreted 
roughly, this means that at most 50 per cent of the “information” con- 
tained in the data is “lost” even for very specialized situations. Most 
of the nonparametric tests considered have efficiencies much higher 
than 50 per cent. Thus the efficiency of the nonparametric results ap- 
pears to be reasonably high in most cases and never is extremely low. 
The efficiency derivations and results are presented in the section titled 
Efficiency Investigation. 

The validity of the nonparametric results is independent of the 
amount of correlation within blocks. This fact can be exploited in de- 
signing experiments which have high precision. The procedure followed 
is to design the experiment so as to obtain large positive correlation 
within blocks and then use the nonparametric results. This will often 
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furnish a substantially higher precision than the procedure of obtaining 
approximate independence among all observations and then using the 
t-statistic. The reason for the increased precision is that the variances 
of the statistics used for the nonparametric results are nearly always 
appreciably decreased when there is high positive correlation within 
blocks. A discussion of the effect of high positive correlation on the 
precision of the nonparametric results is contained in the Correlation 
Discussion section. 

A method of obtaining large positive correlation within blocks con- 
sists in choosing the positions within a block very “near” to each other. 
As an example, in agricultural experimentation a block might be made 
up of nearby plots of ground. As another example, in an industrial ex- 
periment a block might consist of observations which are near in the 
sense of time. Since “nearby” observations are exposed to almost iden- 
tical conditions, they will often have high positive correlation. Also the 
observation for a treatment should have approximately the same ex- 
pected value for each posible position of the treatment in the block. 
Thus high positive correlation will still exist after randomization. 

The next section contains a description of the mathematical model 
and the type of experimental design considered. The section following 
that is titled Statement of Results. This section contains a statement 


of the nonparametric tests and confidence intervals presented in this 
paper along with some numerical examples of their application. 


MATHEMATICAL MODEL 


In order to present a technically accurate description of the type of 
experimental design and of the mathematical model without undue use 
of space, some moderately complicated notation is used. This notation 
is also convenient for the statement of the nonparametric tests and con- 
fidence intervals presented further on in the paper. To assist the reader 
who is unfamiliar with this type of mathematical description for experi- 
mental designs, a notation similar to that presented in reference [3] 
was used. The excellent explanations and descriptions given in [3, pp. 
39-85] may be of help in the understanding of this paper. 

First let us describe the type of experimental design considered. The 
purpose of the investigation is to compare the effects of m different 
treatments. The design is such that the experiment is laid out in n 
separate blocks. The ith treatment occurs u;; times in the jth block and 
ujy21, (i=1,-++, m; j=l1,---, n). The ith treatment may occur 
several times in the jth block; let these occurrences be ordered and use 
k=k(i, j) to denote the Ath occurrence in this ordering (l1SkSu,;). 
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The observation value obtained for the kth occurrence of the ith treat- 
ment in the jth block will be denoted by yj. The position of the treat- 
ments within a block is determined by an independent randomization 
process so that all permissible arrangements are equally likely. 

The mathematical model adopted in this paper assumes that y;jz 
can be written in the form 


(1) Yin = ut ts + By + Ciz, 


where yp, 7;, 8; are parameters and e;;, is a random variable with zero 
mean. Here yu represents the general effect, 7; the effect of the treat- 
ment, and §; the effect of the block; the quantities 7; and 8; satisfy the 
relations > :7;= > ;8;=0. A discussion of this mathematical model is 
contained in [3, pp. 41-85]. 

Let us consider some implications of the independent randomization 
process used within a block. The joint distribution of the y;; within a 
block is an average of the joint distributions which hold for each per- 
missible arrangement of the treatments within the block. Thus n+7; 
+8; is an average of the expected values (means) of the 7th treatment 
for all permissible locations in the jth block. It is not the expected 
value of this treatment for the particular location which happened to 
be selected by the randomization process. In some experimental design 
procedures, the quantity considered is the expected value for the par- 
ticular location chosen rather than an average of the expected values 
over all permissible locations. The Validity of Assumptions analysis 
presented in this paper does not completely apply to such procedures. 
It would seem, however, that investigation of average expected values 
is preferable to investigation of the expected values for the locations 
selected by the randomization process. 

Given that the y;; satisfy the mathematical model specified by rela- 
tion (1), two problems of interest (see [3, p. 44]) are 

(a). Testing the null hypothesis that a specified linear combination 

of the 7; has a given hypothetical value. 

(b). Constructing confidence intervals for a specified linear combina- 

tion of the 7;. 
In both (a) and (b), the sum of the coefficients for the linear combina- 
tion considered is zero (examples: 71—73, 371—572+73+74, etc.). 

When conditions (i)-(iii) are satisfied, the usual method of handling 
(a) and (b) consists in applying the appropriate ¢-statistic. The words 
Student’s Hypothesis in Experimental Designs contained in the title of 
this paper refer to (a) and (b). Thus the nonparametric results pre- 
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sented are methods of handling (a) and (b) when conditions (I) and 
(II) are the only ones which necessarily hold. 

One of the usual purposes of an experiment is to compare the effects 
of two treatments. Consequently, the linear functions of the 7’s most 
commonly considered are of the form 7,—7,. However, other linear 
functions of the 7’s are often of interest (see, e.g., [3, pp. 56—-64)). 


STATEMENT OF RESULTS 


Let us consider the nonparametric method for handling (a) and (b). 
The values of the observations (i.e., the y;;) are assumed to be random 
variables which satisfy the mathematical model (1) and the conditions 
(I), (II). The value of m (the number of blocks) must be at least 4. 

The linear function of the 7; being investigated can always be ex- 
pressed in the form 


(2) Witi + Wire + ° ++ + WaT, 


where >.w;=0. For example, if 7r:—7m is the function to be investi- 


gated, then wi=1, we= +++ =Wm-1=0, Wn=—1. The procedure fol- 
lowed will be to derive nonparametric tests and confidence intervals 
for the general form (2). Then any particular selection for the linear 
function of the 7’s is treated by substituting for the w’s those values 
which make (2) the same as the function considered. 

The first step in obtaining the nonparametric results consists in 


forming the dummy observations Y,, Y2, +--+, Ya, where 
V5 = widag + WeFeg + ++ + + WmImi, (j=1,--+,n) 
and 
Uij 
Fig = Do Yin / Us, (j= 1,--+,m) 


k=l 


=average of observations which were obtained by applying the ith 
treatment in the jth block. 

Since the design requires that all u;;2=1, it follows from (1) and 
>>w;=0 that the Y’s all have expected values equal to (2). The sta- 
tistical independence among blocks, condition (I), implies that the 
Y’s are statistically independent. Since each Y; represents a linear 
combination of treatment effects within the same block, it follows from 
condition (II) that the Y’s have continuous symmetrical probability 
distributions. Thus Yi, ---, Ym represent a set of n independent ob- 
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TABLE 1 





SOME TESTS AND EFFICIENCIES FOR 4$n315 










































































Significance Symmetrical: Accept ¢ ¥¢v if either 
Level Approx. 
n Eff- 
One- | Symmet- One-sided: One-sided: ciency 
sided rical Accept ¢<¢+ if Accept $>¢0 if % 
% % 
4 6.2 12.5 u<do z>¢de 55 
approx. approx. 
5 10 O32+.37r<po 6321+. 37z1> be 65 
5 
3.3 6.2 a<de Z>de 52 
4.7 9.4 max [z, $(z1+20)]<¢e min [z1, }(z:+21)]>¢s 69 
approx. | approx. 
6 2.5 5 6320+. 371<de .632:-+.37z1>¢0 60 
1.6 3.2 M<deo z>¢o 48 
5.5 10.9 max [zs, $(2.-+2:)]<¢e min [zs, $(2:+2.)]>¢e 74 
7 
approx. | approx. 
1 2 .78521-+.3152e<oe .7852:+.315z1>¢0 52 
4.3 8.6 max [z;, $(z«+21)]<¢o min [zs, $(2:+2s)]>¢0 74 
8 
approx. | approx. 
1.0 2.0 | max[zs, (.520+.28ze+.222:)]<¢e | min [zs, (.52:-+. 2821+. 222:)] >oo 59 
5.1 10.2 max [zs, 3(z«+-20)]<¢o min [zs $(2:+2«)]>¢0 74 
9 
1.0 2.0 max [zs, 4(zs+2»)]<¢s min [zs, 4(z:+-2)]>¢0 61 
5.6 11.1 max [zs, 4(zs-+z10)]<¢e min [x, 4(z:+2:)]>¢0 73 
10 
1.1 2.1 max [zs, 4(ze+z10)]<¢e min [zs, $(z:+21)]>¢0 67 
11 2.8 5.6 max [z1, $(2ts+z21:)]<¢e min (2s, $(z:+-21)>¢0 70 
12 1.0 2.0 max [zs, #(ze+z::)]<¢e min [z, §(2:+2:)]>¢0 69 
13 0.5 1.0 max [z:0, }(z:+21)]<¢o min [2,, 4(z:+21)]>¢0 67 
14 1.0 2.0 max [10, 3(z«+21)]<¢o min [zs, 4(2:+29)]>¢0 70 
15 0.5 1.0 max (zn, 4(z1+21)]<¢o min [z;, }(2:+29)]>¢0 68 
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servations from continuous symmetrical populations with a common 
median value equal to (2). Consequently the results of references [1] 
and [2] can be applied directly to obtain significance tests and con- 
fidence intervals for > w,r; on the basis of the Y’s. 

Let us consider problem (a). Then the value of > wir; is to be com- 
pared with a given hypothetical value ¢o. For notational simplicity, 
let = >_wir;; also let 21, 22,-- +, 2, denote the values of the Y’s 
arranged in increasing order of magnitude. Then the order statistic 
and median notation is the same as that used in [1] and [2]. Tables 1 
and 2 of [1] contain some one-sided and symmetrical significance tests 
for 4<n315. In these tables the significance levels for the one-sided 
tests are approximately 0.5%, 1%, 2.5%, 5%, while the significance 
levels for the symmetrical tests are approximately 1%, 2%, 5%, 10%. 
For other significance levels or for n>15, appropriate tests can usually 
be constructed by direct application of the theory developed in [2]. To 
save space, the tables of reference [1] will not be reproduced in this 
paper. However, a small number of tests are listed in Table 1 (of this 
paper) to be used in conjunction with the efficiency investigation. 

Confidence intervals for (b) can also be obtained from Tables 1 and 
2 of [1]. As an example of the method of obtaining a confidence interval 
from a test, let n=5. Then the one-sided test 


Accept @¢< do tf 6325 + 3724 < go 


has a significance level of approximately 5%. Since the significance 
level of this test equals the probability of the relation defined by the 
test holding when ¢= ¢o, 


Pr(.6325 + .372, < ¢) = .05. 
Consequently, 
Pr(.6325 + .372%4 > ¢) = .95 
and 
(— ©, .6325 + .3724) 


is a one-sided confidence interval for ¢ with a confidence coefficient of 
approximately 95%. As another example, let n=12. Then the sym- 
metrical test 


Accept 6¥ do tf either max [:9, 3(x6+212) | <0 or min [24, $(21 +27) ]<¢o 


with a significance level of 2% yields the symmetrical confidence in- 
terval 
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.703 
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(min [24, (21 + 27)], max [ixo, $(vs + 212) ]) 


with a confidence coefficient of 98%. Etc. 
To demonstrate the application of the results presented in this sec- 
tion, a numerical example will be worked out. Let m=3, n=6, and 


.686 
.674 


.637 


548 
887 


773 
759 


746 


Here the horizontal lines represent the separation of the observations 
into blocks while spaces separate different treatments within the same 


.680 
.637 
.048 
.887 
.766 
.746 
First let us test whether the value of 7: exceeds the value of r2 at the 
1.6% significance level. Then wi=1, w2=—1, ws=0 so that 

043 
%2 = .058 


.120 
w% = 121. 
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The test used is (¢o=0) 
Accept 1 —72>0 if m>O0. 


Since .043>0, 71 is significantly greater than 72 at the 1.6% signifi- 
cance level. Next let us obtain a symmetrical confidence interval for 
71—T2 With a confidence coefficient of 90.6%. The confidence interval 
used is 


(min [z2, $(21 + 2s)], max [zs, 3(xs + 26) }). 


Substituting in the numerical values for the z’s from (3), the desired 
symmetrical confidence interval is 


(052, .120). 


Thus there is a probability of .906 that this interval includes the true 
value of 71;—7». 

Finally let us test whether 72+-.05 exceeds 3(71+73) at approximately 
the 2.5% significance level. Then wi= —3, w2=1, w3= —} so that 


x = — .060 23 = — .051 % = — .016 
tz = — .054 % = — 021 te = — .023. 


For this case ¢o = —.05 and the test used is 
Accept ee (71 -+ 73) >- .05 af 632, a 122 > — .05. 


Since both z; and z are less than —.05, r2+.05 is not significantly 
greater than 3(71+73) at the 2.5% significance level. 


EFFICIENCY INVESTIGATION 


Let us consider the power efficiencies of some of nonparametric tests 
(a) which are based on Tables 1 and 2 of [1]. In this section it is as- 
sumed that conditions (i)-(iii) are satisfied and the value of the com- 
mon population variance o? is known. 

The power efficiency of a nonparametric test will be expressed as a 
percentage. It has the interpretation that the corresponding most 
powerful test (same significance level and alternative hypothesis) based 
on this percentage of the data used by the nonparametric test has ap- 
proximately the same power function as the nonparametric test. Here 
a nonparametric test is based on the observations obtained by using the 
ith treatment u;; times in the jth block ({=1, ---,m;j=1,---, 7). 
The corresponding most powerful test is said to be based on 100r% of 
the data used by the nonparametric test if the ith treatment is only 


=*sF 
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used ru;; times in the jth block. The problem is to determine the value 
of r such that the two tests have approximately the same power func- 
tion. Then the power efficiency of the nonparametric test is defined to 
be 100r%. 

The analysis of this paper is limited to the particular case where u,; 
has the same value for all 7. This restriction greatly simplifies the com- 
putations. However, the resulting power efficiencies should be repre- 
sentative of those for the general case. In what follows, u; will be used 
to denote the common value of the u;;. 

For a given significance level a and a given value of r, the most 
powerful one-sided test of ¢<¢@o is (neglecting integer restrictions on 
the ru;) 


(4) Accept ¢ < do if Vn(% — 6)/ y/ > we/ru;i < — Ka, 
1 


where #= >.2;/n and Kg is the value of the standardized normal devi- 


ate exceeded with probability a. That is, Ka is determined by the rela- 
tion 


1 20 
—— f e~#/2dz = a. 
V2rJ x, 


From symmetry considerations, it is sufficient to restrict the efficiency 
investigation to one-sided tests of ¢<¢@o. In Tables 1 and 2 of [1], the 
power efficiency of a one-sided test of ¢<@» equals the power efficiency 
of the corresponding one-sided test of ¢>¢o listed on the same line. It 
also equals the power efficiency of the symmetrical test based on these 
two one-sided tests. 

Given a nonparametric test of ¢<¢o, the problem is to determine 
the value of r so that the power function of test (4) at the same signifi- 
cance level approximately equals the power function for the nonpara- 
metric test. Let a be the significance level for the nonparametric test. 
Then the power function values ¢ for the corresponding most powerful 
test are exactly determined by the relation 


K. = Ka — Vn(¢o — )/oV z W;?/TUs. 


Let 100E% be the power efficiency listed for the nonparametric test 
in [1]. In this section, the nonparametric test can be considered to be 
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based on a sample of size n from a normal population with mean equal 
to ¢ and variance equal to o? >-w?/u;. Using the transitivity property of 
power function equivalence (see [2]), the nonparametric test is approxi- 
mately power function equivalent to a t-test based on nZ sample values 
from a normal population with mean ¢ and variance o? > w?/u;. A 
modification of the approximation to the power function of a t-test 
given in [4] yields that the power function values e’ of the nonpara- 
metric test are approximately determined by the relation 


Ké = Ka — VnE(¢o — 4) [1 — Ka?/2(nE — 1)]'2/oV >> w2/ui. 


Equating Ke and Ke’, the nonparametric and most powerful tests are 
approximately power function equivalent when 


r= [1 — K.2/2(nE — 1)]E. 


Thus the power efficiency of a one-sided nonparametric test of ¢<¢o is 
approximately 


100[1 — K.2/2(nE — 1)|E% 


if a is its significance level and 100E% is the efficiency for normality 
listed for this test in [1]. 

Efficiencies were computed for the tests of Table 1. The values ob- 
tained for the tests of Table 1 should furnish a satisfactory indication 
of the power efficiency behavior for the tests listed in [1]. Examination 
of the values listed in Table 1 show that the efficiencies obtained vary 
between 48% and 74%. The usual efficiency lies in the neighborhood 
of 65%. Although these efficiencies are not large, it must be remem- 
bered that the situation considered is extreme. 

The close relationship between a significance test and the confidence 
interval on which it is based indicates that the efficiency magnitudes 
obtained for the tests also are roughly applicable to the corresponding 
confidence intervals. 


CORRELATION DISCUSSION 


Let us consider the effect of large positive correlation within blocks 
on the precision of the nonparametric results. To simplify the discus- 
sion, it will be assumed that within the jth block each y has the same 
variance o?, (j=1, ---, ”). Also the correlation between any two y’s 
in the same block is assumed to have the value p, where p is the same 
for all blocks; that is, the correlation between y;,;., and Yi,;, has the 
common value p for all permissible selections of 41, 72, ki, ke, and j. 
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There is statistical independence among blocks as required by condition 
(I). Then it can be shown that the variance of Y; has the value 


oj? Dy wi?/Uis’, (j=1,---,n), 
1 
where 
uiz’ = uiz/(1 — p). 


Thus, from the viewpoint of variance, introduction of the correlation 
is equivalent to having zero correlation within blocks and using the ith 
treatment u;;’ rather than wu,; times in the jth block. Stated in another 
way, from the viewpoint of variance, introduction of the correlation is 
equivalent to increasing the size of the experiment in the ratio 1/(1—p). 

As an example, let p=.9. Then the precision of the experiment is the 
same as that of a corresponding experiment with zero correlation within 
blocks and the ith treatment occurring 10u,; times in the jth block. 
The analysis is based on the simplified (and often unrealistic) assump- 
tion of the same correlation between any two observations in the same 
block and equal correlation for all blocks. However, it should be suffi- 
cient to show that the precision can be greatly increased by designing 
the experiment to yield high positive correlation within blocks and then 
using the nonparametric results. 


VALIDITY OF ASSUMPTIONS 


This section contains an examination of the validity of conditions 
(i)-(iii) from the viewpoint of practical applications. A more general 
condition which is sometimes used to replace (i) is also considered. 
Finally, a discussion of the validity of conditions (I) and (II) is pre- 
sented. 

Let us consider the joint probability distribution of the observations 
within a block. Since the position of the treatments is determined at 
random, this distribution is an average of the joint distributions which 
hold for each permissible arrangement of the treatments within the 
block. This fact makes reasonably accurate evaluation of the amount 
of correlation within a block very difficult. For example, if the positions 
within a block are of a homogeneous nature, so that the expected value 
of the observation yielded by a treatment is nearly the same for all 
eligible positions, then the correlation after randomization is roughly 
an average of the correlations for all permissible arrangements of the 
treatments. On the other hand, if the expected values for a treatment 
differ greatly from position to position (compared to the standard de- 
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viations), the correlation after randomization is almost completely de- 
termined by the expected values. In this second case the resulting cor- 
relation is usually negative; sometimes this negative correlation is ap- 
preciable enough greatly to affect the magnitude of standard deviations 
for combinations of observations. Between these two extreme cases, 
there are a variety of intermediate situations. Also some treatments 
may tend to one of these extreme cases while some of the remaining 
treatments tend to the other extreme. 

A common practical stituation is the one where strong positive corre- 
lation exists for every permissible arrangement of treatments within a 
block. Then after randomization the amount of ccrrelation could be 
anywhere between strong positive and strong negative. For situations 
of this nature, even a moderately accurate evaluation of the order of 
magnitude of the correlation is extremely difficult. The amount of 
correlation between one pair of treatments may be substantially dif- 
ferent from that between another pair of treatments. Also the magni- 
tude of the correlation may vary greatly from block to block. In 
general there does not appear to be any easy way of detecting these 
correlation variations within and between blocks. 

On the basis of the above considerations, it appears that acceptance 
of the conclusion that the correlation is the same between any two 
treatments in the same block and has the same value for all blocks is 
almost impossible to justify on intuitive grounds. There are many easy 
ways in which this conclusion could be violated. For many cases of 
practical importance even gross violations are difficult to detect. In 
particular this applies to the conclusion that there is zero correlation 
within each block, which is necessary if condition (i) is to be satisfied. 
It also applies to a more general condition which is sometimes used to 
replace (i). This more general condition consists in assuming that there 
is a correlation p between any two observations within the same block 
and that the value of p is the same for all blocks. It is also assumed 
that there is independence among blocks. On the basis of the more 
general condition combined with (ii) and (iii), problems (a) and (b) are 
handled by statistics having approximate ¢-distributions (see, e.g., [3, 
Chapter 7]). 

Next consider condition (ii). If the blocks are not near each other, 
there is seldom any strong reason for believing that observations from 
different blocks have even approximately the same variance. Within 
the same block, different treatments sometimes have variances which 
differ noticeably. Unless a reasonably accurate evaluation of the corre- 
lations is available, variance comparisons based on the data from the 
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experiment are of doubtful value. Consequently it appears that there 
are many practical situations when (ii) is of questionable validity even 
in the sense of being a reasonable approximation; also that checking 
the validity of condition (ii) on the basis of the observations is not 
easy. 

Finally consider condition (iii). After randomization, the probability 
distribution of an observation yielded by a treatment is an average of 
the distributions for that treatment at each eligible position in the 
block. If the expected values of the distributions for each eligible po- 
sition differ even moderately, it seems likely that the average distribu- 
tion will deviate noticeably from normality. Usually the peakedness of 
this average distribution will be nowhere near as great as that for 
normality. Also the shape of the average distribution may be of a very 
uneven nature. Consequently condition (iii) seems to be of doubtful 
validity for many practical situations. 

The above discussion indicates that practical situations where con- 
ditions (i)-(iii) hold to a reasonable approximation do not occur too 
often. Violations of (i)-(iii) can greatly affect the validity of results 
based on the t-statistic (see, e.g., [3, pp. 83-84]). 

The nonparametric tests and confidence intervals were introduced 
to replace those based on the t-statistic for cases where the conditions 
assumed for the t-statistic appear to be violated or are of a doubtful 
nature. Let us investigate the two conditions which are sufficient for 
the validity of the nonparametric resuiis. 

First consider condition (I). If the experiment is carefully designed, 
it appears that this condition can always be satisfied to a reasonably 
close approximation. Ordinarily the blocks can be placed sufficiently 
far “apart” (in the sense of distance, time, etc.) to warrant an assump- 
tion of independence. 

Finally consider condition (II). If the data are for a continuous type 
of variable (such as length, weight, time, etc.), it is usually permissible 
to assume that the probability distributions (cumulative) encountered 
are continuous. This applies in particular to any linear combination (of 
observations). Hence the continuity part of (II) appears acceptable for 
most practical situations using a continuous type of variable. 

Within a block, the probability distribution for a linear combination 
of treatment effects equals an average of the distributions of that com- 
bination for all permissible assignments of the treatments in that block. 
The number of permissible assignments is usually very large. For the 
ordinary type of practical situation, there should not be a huge varia- 
tion in the shapes of the distributions for permissible assignments. 
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Thus, if there is at least a moderate variation among the mean values 
of these distributions and the dispersal of these mean values is some- 
what symmetrical (which should be the usual case for a large number 
of permissible assignments), the average distribution should be ap- 
proximately symmetrical. If there is only small variation among the 
mean values, the facts that sums of observations are being dealt with 
and that the distribution of an observation is often roughly symmetrical 
indicate that the average distribution will be approximately symmetri- 
cal. Consequently it would appear that condition (II) is intuitively 
acceptable in many practical applications. 


REFERENCES 


[1] Walsh, John E., “Applications of Some Significance Tests for the Median 
Which Are Valid Under Very Genera] Conditions,” Journal of the American 
Statistical Association, 44 (1949), 342-55. 

[2] Walsh, John E., “Some Significance Tests for the Median Which Are Valid 
Under Very General Conditions,” Annals of Mathematical Statistics, 20 
(1949), 64-81. 

[3] Cochran, W. G., and Cox, G. M., Experimental Designs. New York: John 
Wiley and Sons, 1950. 

[4] Johnson, N. L., and Welch, B. L., “Applications of the Non-Central t-Dis- 
tribution,” Biometrika, 31 (1940), 376. 











THE POWER OF TWO DIFFERENCE-SIGN TESTS 


ALAN STUART 
London School of Economics 


1. Introduction and Summary 


ITUATIONS arise in which we are unable to make the assumptions 
“ necessary for the application of standard theory based on the nor- 
mal distribution. Most of the distribution-free tests which have been 
proposed for such situations are based on statistics which are very easy 
to compute, and this ease of computation goes some way to compensate 
for any information, available in the sample, which may be ignored. 
Quite apart from such situations, it is interesting to investigate the 
performances of distribution-free tests when the standard normal situa- 
tion in fact holds good, for if a test is consistent (i.e. if the probability 
of rejecting a false alterna ive hypothesis tends to unity with increasing 
sample size) there must cume a point, for any set of alternatives, where 
the loss of power involved in its use is negligible. In this paper two very 
simple tests are examined in this light. 

A distribution-free test of serial independence of N (unequal) obser- 
vations ordered in time, proposed by Moore and Wallis [6], consists in 
counting the number of positive first differences in the series. On the 
null hypothesis that the observations came from the same (continuous) 
population, every ordering of the observations is equally probable, so 
that the mean value and variance of the statistic are very simply ob- 
tained, and its distribution can easily be shown to be asymptotically 
normal. 

A lower bound for the power of the test against a general class of 
alternatives, implying a trend in the observations, was obtained by 
Mann [5]. This paper considers its power in the particular case where 
the alternative is a normal regression model with coefficient 8 and re- 
sidual variance o?. The loss of power entailed by the use of this test at 
the 95% level of significance is unimportant when either N 225, 
B/or/2=.5, or N2=75, B/ovV/22.3. 

The difference-sign test is easily generalized to the bivariate case for 
use in testing the correlation between two series. The approximate 
power of this second test is tabulated below against the alternative 
hypothesis that the pairs of observations were drawn from a bivariate 
normal population with non-zero correlation p. Much larger sample 
sizes are required for the power of this test to approach that of the test 
based on Fisher’s transformation of the correlation coefficient. For 
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p2.5, a sample of about 100 is necessary; for p=.4, N must be about 
200, and for p=.3, N must be about 400. 


2. The null hypothesis 
A series of N observations has (N -—-1) first differences. We define 


N-1 
D = > D; 
t=1 
where D; is unity when the 7th difference is positive, and zero otherwise. 
We have 


E(D) = > E(D,) 
t=] (1) 





N-1 N-1 
varD = >> varD;+2 >> cov (Di, Di) 


t=1 t,j=ml i<j J 


On the null hypothesis, with every ordering equally probable, any 
D; is a O—1 variable with equal probabilities. Further, the only non- 
zero covariances are those involving triplets of adjacent observations, 
i.e. cov(D;, Dis1), and there are (VN —2) of these. If we examine the 6 
possible permutations of three unequal numbers and score D;, De as 
defined above, we find that there is only one case out of the six in which 
D,=D.=1. Thus E(D,D.) =}. We thus obtain 





1 ’ 
E(D) = > 
eetidinin (5) J 
2 2 4 
1 1\2 (2) 
cov (Di, Diy) = rn “ (=) =— B 
cov (Di, Diss) = 0 (r = 1) 
and substitution in (1) gives, for the null hypothesis, 
E(D) = —(N-1) 
} (3) 





1 1 
var (D) = = (V1) -—(V-2) = FI) 








see “sare er fr 
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3. The tendency to normality 


Hoeffding and Robbins [2] have recently established an interesting 
extension of the Central Limit theorem for dependent variables. De- 
fining 


E(Y;) = ws 


M 

and A;=var (Yi+mu)+2 >, cov (Yiem_-jYism), they prove the theorem: 
j=l 

If in a sequence of N random variables, Y;,; is independent of Y; for 

j>AM and all 7, and the variance and third absolute moment exist for 

each member of the sequence, and 


: 2» 
Lim — >> Ais = A exists for all ¢ uniformly, 


»—@ Pp h=1 


then as No, the sum (Yi1+ --- +Yvy) is asymptotically normally 


N 
distributed with mean u= >. y; and variance NA. This result applies 


f=1 
immediately here to demonstrate the asymptotic normality of D. 


4. The alternative hypothesis: normal regression 


We now suppose that the observations are spaced at equal time in- 
tervals, as is commonly the case. We lose no generality by making these 
time intervals of unit length. The model for a normal regression on a 
“fixed” time variable is then 


y. =a+peKe+ x 


where 2; is a normal variate, with mean zero and variance o? (for all k), 
and 2; is uncorrelated with x; when k+l. It follows immediately that 
(Y¥x—Yx-1) is a normal variate with mean 8 and variance 20, and that 
(ye—Ye-1) and (Yyx4i1—Yyx) are bivariate normal with correlation —}, 
while (y.— yx-1) is uncorrelated with any (yr4-—Yx4r—-1) for O¥r¥1. 

Now equations (1) still hold under this (or any other) alternative 
hypothesis. We have only to evaluate the probabilities: 


Fe that D; = 1 
Ps, that D; = Diss = 1 


and these are clearly functions of 8 and o, and, since o is simply a scale 
parameter, we can reduce them toa function of R=8/ev/2. To evaluate 
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the probabilities P4 and Pz, we require tables of the normal integral 
and of the bivariate normal integral when the correlation coefficient 
p= —}. These are given by Kelley [3] and Pearson [7] respectively. We 
give the values in Table 1. 











TABLE 1 
R oe Pa Ps var D 

oV/2 
0 -5000 . 166 ,667 -083 ,333N + .083 ,333 
oa .5398 .209 ,238 .084 ,124N + .080,168 
a .579% ' (256 ,736 -086 ,007N + .071 ,697 
3 .6179 .308 ,312 -089 ,124N + .057 ,852 
4 -6554 .362 ,879 .092 ,511N +.040,829 
5 -6915 -419 ,222 -095 ,428N + .022 ,472 

1.0 -8413 -686 ,482 -090 ,906N — .048 ,298 





From (1), we now have that, for the alternative hypothesis, 
E(D) = (N — 1)Pa 
var D = (N — 1)Pa(1 — Pa) + 2(N — 2)(Pa — Pa’). 


The variance is tabulated in the last column of Table 1, the first line 
of which corresponds to the null hypothesis. The asymptotic normality 
of D under the alternative hypothesis again follows from the Hoeffding- 
Robbins theorem. If we assume exact normality for both the null and 
alternative hypothesis distributions (Moore and Wallis showed the ap- 
proximation to be satisfactory for N = 12 in the null case), we can tabu- 
late the approximate power of the test against this alternative. 


5. The power of the D-test against the normal regression alternative 


We consider only the case of an upward trend (8>0), since tests 
against downward trend can be carried out by changing the signs of the 
observations and proceeding as for upward trend. The alternative hy- 
pothesis (8>0) then implies a higher value of D than does the null 
hypothesis, so that we shall reject the null hypothesis whenever 
D=}(N—1)+e[(N+1)/12]', where a is the appropriate normal devi- 
ate for the level of significance desired, which we assume to be 95%. 
We then obtain the power of the test at the 95% level by first solving 
for Z the equation 


N +1 





1 1/2 
oa ~«R+ 1.64485 ( ) = (N — 1)Pa + ZV/var D 
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where the values on the right-hand side are obtained from Table 1 for 
successive values of R. We then need only find, for each value of N 
desired, 


P = f {2m exp (y*)}-"%dy 


which is the required power. The results are given in Table 2. The test 
is clearly consistent, i.e., for any R the power tends to unity with in- 
creasing N. The values in Table 2 are to be compared with the power 


TABLE 2 


APPROXIMATE POWER OF THE D-TEST AGAINST A NORMAL 
REGRESSION ALTERNATIVE AT THE 95% LEVEL 


W\e 1 4 8 
V2 











5 1.0 

15 -12 -25 42 -59 -74 -99 

25 -16 -36 61 -80 -92 1.00 

40 -21 -51 -79 -04 -99 1.00 

50 -24 -59 -87 -97 1.00 1.00 

75 -32 -75 -96 1.00 1.00 1.00 

100 -39 -85 99 1.00 1.00 1.00 





of the best available test, based on the distribution of the sample re- 
gression coefficient b, which is exactly normal with mean 6 and variance 
o?/ >. (t—A)?. In this case, as the time-intervals are equally spaced, we 
find that >>(t—f?=7,;N(N?—1), so that the variance of b is of order 
N-*!?, (This is an example of the value, in regression problems, of equal 
spacing of the “fixed” variable.) As a consequence of this, the power of 
the exact parametric test becomes effectively unity if R2.1and N 212, 
and even for R = .05 when N exceeds 25. It is clear, therefore, that there 
is a substantial loss of power involved in using the D test in normal re- 
gression situations when R or N is small. But Table 2 shows that when 
N 225 and R2.5, or when N275 and R2.3, the loss of power is at 
most 8 per cent, and the test recommends itself for such situations. A 
general approach to the problem of evaluating the power of tests based 
on runs up and down, of which the test considered here is an example, 
is given by Levene [4]. 


6. The bivariate difference-sign test 


The statistic D tests the serial independence of a single series of 
observations, but it can easily be generalized to the case of two series 
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of observations in time when we wish to investigate the correlation be- 
tween movements in the two series. As before, the observations in each 
series of N observations are assumed to be unequal. We write down the 
first differences of each series, one set under the other, and score one 
whenever the signs agree and zero otherwise. For example, we might 
have 


++¢<-4 <-<-<- 4 = 
ph maw fo om | > 
for two series of 10 observations, and the score would be 5 out of a 


possible 9. 
Formally, if we define D; for the first series as before, and d; for the 
second series in an exactly analogous fashion, we define a score 


C; = Dd, 


and our correlation statistic is 
N-1 
C=). 
t=] 


C, like D in the univariate case, ranges from zero to (VN—1). 

The null hypothesis is now that our N pairs of observations were 
drawn from the same continuous bivariate population in which the 
two variates concerned, say X and Y, are independent. On this hy- 
pothesis each of the (N!)? possible ways of pairing the two series, and 
then ordering the pairs, is equally probable. 


7. The distribution of C on the null hypothesis 
Equations (1) for the univariate case hold here also, with C’ substi- 
tuted everywhere for D, and precisely as before these become 
E(C) = (N — 1)E(C;) . (4) 
var C = (N — 1) var C + 2(N — 2) cov (Ci, Cis) 


Once more, we find that 


E(C;) = (5) 


var (C;) = } 
but to find cov (C;, Ci+1), we must consider two sets of the 6 possible 
permutations of three unequal numbers, and form the 36 possible com- 


binations of one from each set. The pattern of difference signs for each 
set is: 








"FF == 68 G Ss owe aE FE EE 
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Permutation Signs 
123 + + 
132 -- 
213 —- + 
231 >= 
312 — + 
321 - os 


If we compare each of these six sign patterns in turn with each of the 
others and with itself, we obtain the 36 possibilities. We find that the 
first and the last patterns agree exactly only with themselves, while 
each of the other four patterns agrees exactly with itself and one of the 
others. There are thus 2+8=10 cases out of 36 in which C;=Ci41=1 
and C;C4:=1 only when this is so. Thus 


E(CCi) = = 
36 
and 
cov (Ci, Cix1) = i (=) 3 , (6) 
36 2 36 
Substituting (5) and (6) in (4), we obtain 


1 
E(C) = > (N - 1) 
/. (7) 





1 2 1 
C=—(N-1 —(N — 2) = — (11N — 13) 
var a‘ ) + 3 ) 36 ° | 


8. The alternative hypothesis 


In this bivariate case, we consider the alternative hypothesis that the 
N pairs of observations were drawn from a bivariate normal population 
with correlation parameter p. (The null hypothesis is then the special 
case when p=0.) 

In discussing the C-statistic from another viewpoint, that of estimat- 
ing the correlation parameter of a bivariate normal population, Chown 
and Moran [1] have pointed out that it follows very simply from the 
sampling distribution of Kendall’s rank correlation coefficient that for 
samples from a bivariate normal population 
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N-1 2 . 
E(C) = r (1 += are sin p) 


T 
1IN-13 1 : 
var C = — wa” {(N — 1)(are sin p)? 
+ 2(N — 2)(are sin }p)?} 
C is asymptotically normal in virtue of the Hoeffding-Robbins theorem, 
on either the null or alternative hypothesis. 


9. The power of the C-test against a bivariate normal alternative 


For any specified value of p(>0) we can now find the power of the 
C-test, assuming, as we did for D, that the approximate normal dis- 
tributions are exact. Tests against p<0 can be carried out by changing 
the signs of one series of observations. We must first solve for Z the 
equation 


‘(NW — 1) + 1.64485 (= a =) 
2 36 
N-1 2 ; 
= (1 + — arc sin °) 
2 T 


a, / i — 13 1 [W 1)( in p)? 
36. 7? are sin p 


+ 2(N — 2)(are sin io)']} 


1/2 











and then find P, as before, for each value of N to be considered. The 
results are given in Table 3. 


TABLE 3 


APPROXIMATE POWER OF THE C-TEST AGAINST A BIVARIATE 
NORMAL ALTERNATIVE AT THE 95% LEVEL 











N\p 8 2 3 .4 5 6 7 

50 it .20 .34 51 .69 .86 .96 
100 14 .31 54 Bs i .93 .99 1.00 
200 .20 .50 .80 .96 1.00 1.00 1.00 
400 .31 75 .97 1.00 1.00 1.00 1.00 





We could test p=0 by using Fisher’s transformation' z=} log 





1 We consider the z-transformation rather than the exact Student's test for p =O because the power 
of the former against p0 is easier to compute, and it is commonly used despite its approximate nature. 
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(1+1r)/(1—r). Since z is approximately normally distributed with mean 
¢=4 log (1+ )/(1—p)) and variance 1/(n—3), the approximate power of 
the z-test at the 95% level is given by solving for z the equation 


x = 1.64485 — S/n — 3 


and calculating P as before from the normal distribution function. The 
results are given in Table 4. 


TABLE 4 


APPROXIMATE POWER OF FISHER’S z-TRANSFORMATION 
AS A TEST OF p=0 AT THE 95% LEVEL 











N\p l 2 3 4 5 

50 17 .40 .69 .90 .98 
100 .26 .64 .92 .99 1.00 
200 41 .89 1.00 1.00 1.00 
400 .64 .99 1.00 1.00 1.00 





Comparison of Tables 3 and 4 shows that the loss of power resulting 
from the use of C is considerable. Nevertheless, its performance is 
clearly satisfactory for p2=.5, N2100; or p2=.4, N2200; or p2.3, 
N = 400, when at most 8% of the power is lost. 
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NUMERICAL TABULATION OF THE DISTRIBUTION OF 
KOLMOGOROV’S STATISTIC FOR FINITE 
SAMPLE SIZE 


Z. W. BrrnBaum* 
University of Washington and Stanford University 
1. Introduction 


ET X be a random variable with the continuous probability dis- 
L tribution function 


F(x) = Prob {X Sz}, 


and let X1, Xo, -- -, Xw be a sample of size N for X, ordered so that 
XisX:S --+- SXy. We define the empirical distribution function 
Fy(x) by 
0 for x< Xj 
Fy(2) = = me EeSe<Ea, 48 fo, %-+, 8 o4 
1 foe Zz & =. 


The empirical distribution function is a step-function with N jumps, 
each of height 1/N, occurring at the points of the sample. 

One would expect that, for N large, F(x) will very likely be close 
to F(x). In 1933, Kolmogorov [1] introduced the statistic 


Dy = least upper bound of | F(x) — Fy(z)| 


which measures the greatest absolute discrepancy between F(x) and 
Fy(x), and showed that it has the following properties which make it 
particularly useful for judging how “close” Fy(zx) is to F(x): 


1) the probability distribution of Dy depends on N but is inde- 
pendent of F(z) (Dy is a “distribution-free” statistic) 

2) for N large, the probability distribution of Dw is given by the re- 
lationship 


(1.1) lim Prob { Ds < =\ =1—2)>5 (-1)e-**#* = L(z). 


n— 0 j=l 


The function L(z) has been tabulated by Smirnov [2].'! A new proof 





* Research done under the sponsorship of the Office of Naval Research. 
1 The expression for L(z) in [2] contains a misprint: e~*** instead of e~?™*. 
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of (1.1) has been given recently by Feller [3] and a heuristi: outline of a 
proof by Doob [4]. 

The asymptotic distribution (1.1) makes it possible to use the sta- 
tistic Dy for testing the hypothesis that a large sample was obtained 
from a random variable X with a distribution function F(x) which is 
explicitly given; it also may be used for constructing a “confidence- 
band” about the empirical distribution function Fy(z) so that it can 
be asserted on a preassigned probability level that the unknown “true” 
distribution function F(z) is entirely contained in that band. In either 
type of application a difficulty arises due to the fact that the known 
proofs of (1.1) give no indication how large N must be to make this 
approximation sufficiently close for practical use. An obvious way to 
overcome this difficulty is to compute numerically and tabulate the 
probability distribution of Dy for finite N up to values for which a 
good agreement is reached with the asymptotic formula (1.1). An 
adaptation of Feller’s argument for such a computation was proposed 
in [5]. 

Kolmogorov, in his original paper [1], derived a system of recursion 
formulas which make it possible to compute for any finite N the prob- 
abilities 


Prob {Dy <<} forc = 1,2,---, WN. 


These formulas were used to compute Table 1 of the present paper. 
They are reproduced as (A 1.1)-(A 1.4) in the Appendix where the 
theory of the computations is presented. 

Massey [6] obtained a system of recursive formulas, equivalent with 
(A 1.1)-(A 1.4), as well as a procedure for replacing them by a system 
of difference equations. He tabulated Prob {Dy <c/N} for N =5 (5) 80 
and selected values of c<9; there is, however, no estimate given of the 
error resulting from the large number of computations needed to ob- 
tain every result in this tabulation. A table of 100 a% percentage points 
was also given by Massey [7], for a=.20, .15, .10, .05, .01 and 
N =1 (1) 35, to two significant digits. 

Table 1 of the present paper contains values of Prob {Dy<c/N}, 
computed to five decimals, for N=1 (1) 100 and c=1 (1) 15. The 
method of computation used involves a “truncation” of Kolmogorov’s 
recursion formulas (A 1.1)-(A 1.4), and has made it possible to reduce 
the number of computations needed and to obtain estimates of the 
errors due to the truncation and to the accumulated effect of round-offs 
on a digital computing machine. 
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Table 2 contains the 95% points of the distribution of Dy for 
N=2 (1) 5 (5) 30 (10) 100, and the 99% points for N =2 (1) 5 (5) 30 
(10) 80, as well as a comparison with the corresponding values ob- 
tained from the asymptotic formula (1.1). 

A comparison of Table 1 with the values tabulated by Massey in [6] 
shows agreement except for a few entries, particularly that for N =5, 
~=2. Similarly a comparison of Table 2 with Massey’s table in [7] dis- 
closes only minor discrepancies, the largest being those at the 95% 
point for N =25 and at the 99% point for N = 10, 20. 


2. Tabulation of Prob {Dn <c/N} 


Table 1 below was computed on the U. S. Bureau of Standards West- 
ern Automatic Computer (SWAC), at the Institute for Numerical 
Analysis.2 The computation was programmed according to formulas 
(A 3.1), (A 3.2), (A 3.3) of the Appendix, modified for a binary com- 
puter; the truncation was performed at r= 12, and the rounding off was 
carried out at t’=35 binary digits, which corresponds to about ¢= 10.53 
for decimal digits. This should assure everywhere an error less than 
5-10-*. The final results were rounded off to 5 decimals. An alternative 
set of formulas was used for a check. 


3. Table of 95% and 99% points 
By ev, .95 and ey, .99 we denote the solutions of the equations 


P(Dn < en, 98) = .95 
P(Dn < ey, .99) = .99. 


Table 2 contains in columns (2) and (3) values of ev, .95 and en, .99, to 
4 decimals. Columns (4) and (5) contain the values 


év, 95 = 1.3581-N-/2? and éy, .99 = 1.6276: N-1/2, 


which are the asymptotic 95%- and 99%-points computed according 
go (1.1). The quotients éy, .95/ev, .95 and éy, .99/ev, .99 tabulated in 
columns (6) and (7) indicate the manner in which these asymptotic 
values approach the exact values with increasing N. It appears, in par- 
ticular, that the asymptotic values are always greater than the exact 
ones and that for N=80 the approximation by (1.1) is already quite 
good. 





2 The writer takes this occasion to acknowledge the assistance given him by the Institute for Nu- 
merical Analysis, and to express his gratitude in particular to Dr. F. S. Acton, Dr. Gertrude Blanch, and 
Mrs. Roselyn S. Lipkis for their help and advice. 








TABLE 1 
Prob { Dy <c/N} 





























N 1 2 3 4 5 6 7 8 9 10 
c 
1 1.00000 .50000 .22222 .09375 .03840 .01543 .00612 .00240 .00094 .00036 
2 1.00000 .92593 .81250 .69120 .57656 .47446 .38659 .31261 .25128 
3 1.00000 .99219 .96992 .93441 .88937 .83842 .78442 .72946 
4 1.00000 .99936 .99623 .98911 .97741 .96121 .94101 
5 -00000 .99996 .99960 .99849 .99615 .99222 
6 1.00000 1.00000 .99996 .99982 .99943 
7 1.00000 1.00000 .99998 
8 -00000 
N 11 12 13 14 15 16 17 18 19 20 
c 
1 -00014 .00005 .00002 .00001 .00000 .00000 .00000 .00000 .00000 .00000 
2 -20100 .16014 .12715 .10066 .07950 .06265 .04927 .03869 .03033 .02374 
3 -67502 .62209 .57136 .52323 .47795 .43564 .39630 .35991 .32636 .29553 
+ -91747 .89126 .86304 .83337 .80275 .77158 .74019 .70887 .67784 .64728 
5 -98648 .97885 .96935 .95807 .94517 .93081 .91517 .89844 .88079 .86237 
6 -99865 .99732 .99530 .99250 .98882 .98425 .97875 .97235 .96506 .95693 
7 -99993 .99979 .99953 .99908 .99837 .99736 .99598 .99419 .99195 .98924 
8 {1.00000 .99999 .99997 .99993 .99984 .99968 .99944 .99907 .99856 .99788 
9 1.00000 1.00000 1.00000 .99999 .99997 .99994 .99989 .99980 .99968 
10 -00000 1.00000 1.00000 .99999 .99998 .99996 
11 1.00000 1.00000 1.00000 
N 21 22 23 24 25 26 27 28 29 30 
¢c 
2 -01857 .01450 .01132 .00882 .00687 .00535 .00416 .00323 .00251 .00195 
3 -26729 .24147 .21793 .19650 .17702 .15935 .14334 .12885 .11575 .10392 
4 -61733 .58811 .55970 .53216 .50554 .47987 .45517 .43145 .40870 .38693 
5 -84335 .82386 .80401 .78392 .76368 .74338 .72309 .70288 .68280 .66290 
6 -94802 .93837 .92805 .91712 .90565 .89368 .88128 .86851 .85541 .84203 
7 -98605 .98236 .97817 .97349 .96832 .96269 .95661 .95010 .94318 .93588 
8 -99700 .99590 .99456 .99296 .99110 .98895 .98651 .98378 .98076 .97745 
9 -99949 .99924 .99890 .99846 .99792 .99725 .99645 .99551 .99441 .99315 
10 -99993 .99989 .99982 .99973 .99960 .99943 .99921 .99894 .99861 .99821 
ll -99999  .99999 .99998 .99996 .99994 .99990 .99985 .99979 .99971 .99960 
12 1.00000 1.00000 1.00000 1.00000 .99999 .99999 .99998 .99997 .99995 .99992 
13 -00000 1.00000 1.00000 1.00000 .99999 .99999 
14 -00000 1.00000 
N 31 32 33 34 35 36 37 38 39 40 
ec 
2 -00151 .00117 .00091 .00070 .00054 .00042 .00033 .00025 .00020 .00015 
3 -09325 .08363 .07497 .06717 .06016 .05386 .04820 .04312 .03856 .03448 
4 -36612 .34624 .32729 .30923 .29205 .27570 .26018 .24544 .23145 .21819 
5 -64323 .62382 .60470 .58590 .56744 .54934 .53161 .51427 .49733 .48078 
6 -82843 .81463 .80069 .78663 .77250 .75831 .74410 .72990 .71572 .70159 
7 -92822 .92022 .91192 .90332 .89447 .88538 .87608 .86658 .85690 .84707 
8 -97384 .96995 .96578 .96134 .95664 .95168 .94648 .94104 .93539 .92952 
9 -99172 .99012 .98834 .98638 .98423 .98191 .97939 .97670 .97382 .97077 
10 -99773  .99717 .99652 .99578 .99494 .99399 .99294 .99178 .99050 .98910 
ll -99946 .99930 .99910 .99886 .99857 .99824 .99785 .99741 .99692 .99636 
12 -99989 .99985 .99980 .99973 .99965 .99954 .99942 .99928 .99911 .99891 
13 -99998 .99997 .99996 .99994 .99992 .99990 .99986 .99982 .99977 .99971 
14 1.00000 1.00000 1.00000 .99999 .99999 .99998 .99997 .99996 .99995 .99993 
15 1.00000 1.00000 1.00000 .99999 .99999 .99999 .99999 
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TABLE 1—(Continued) 

N 41 42 43 44 45 46 47 48 49 50 
c 
2 | .00012 .00009 .00007 .00005 .00004 .00003 .00002 .00002 .00001 .00001 
3 .03081 .02753 .02459 .02196 .01960 .01750 .01561 .01393 .01242 .01108 
4 .20562 .19373 .18247 .17181 .16174 .15222 .14323 .13474 .12672 .11916 
5 | .46464 .44891 .43359 .41868 .40418 .39008 .37639 .36310 .35020 .33769 
6 .68752 .67354 .65965 .64588 .63223 .61872 .60536 .59215 .57911 .56623 
7 .83711 .82702 .81684 .80657 .79623 .78583 .77539 .76492 .75442 .74392 
8 .92345 .91719 .91075 .90415 .89739 .89048 .88344 .87628 .86899 .86160 
9 .96754 .96413 .96056 .95682 .95293 .94888 .94467 .94033 .93584 .93122 
10 .98759 .98596 .98421 .98233 .98033 .97822 .97598 .97363 .97115 .96856 
11 .99573 .99504 .99428 .99344 .99253 .99154 .99047 .98933 .98810 .98679 
12 .99868 .99842 .99813 .99779 .99742 .99701 .99655 .99605 .99550 .99490 
13 .99963 .99955 .99945 .99933 .99919 .99904 .99886 .99866 .99844 .99820 
14 .99991 .99988 .99985 .99982 .99977 .99972 .99966 .99959 .99951 .99941 
15 .99998 .99997 .99996 .99995 .99994 .99993 .99991 .99988 .99986 .99983 
N 51 52 53 54 55 56 57 58 59 60 
ce 
2 .00001 .00001 .00001 .00000 .00000 .00000 .00000 .00000 .00000 .00000 
3 00988 .00880 .00785 .00699 .00623 .00555 .00494 .00440 .00392 .00349 
4 .11203 .10530 .09896 .09298 .08735 .08205 .07706 .07236 .06793 .06377 
5 .02556 .31381 30242 .29140 .28073 .27041 .26042 .25077 .24144 .23242 
6 .55353_.54101 52868 .51654 .50459 49283 .48128 .46992 .45876 .44780 
7 73342 72294 71247 .70203 69162 .68126 .67094 .66068 .65049 .64035 
8 85412 84654 83889 .83116 .82337 .81552 .80762 .79968 .79171 .78370 
9 -92648 92161 .91662 .91152 .90632 .90102 89562 .89013 .88455 .87889 
10 .96586 .96304 .96011 .95708 .95393 .95069 .94734 .94390 .94036 .93674 
ll 98540 .98392 .98237 .98073 .97900 .97720 .97531 .97334 .97129 .96916 
12 .99425 .99356 .99280 .99200 .99113 .99022 .98924 .98821 .98712 .98598 
13 -99792 99762 99729 .99693 .99654 .99611 99565 99515 .99462 .99406 
14 -99931 99919 .99906 .99891 .99875 .99857 .99837 .99815 .99791 .99765 
15 .99979 99975 .99970 .99964 .99958 .99951 .99943 .99934 .99925 .99914 
N 61 62 63 64 65 66 67 68 69 70 
c 
3 .00310 .00276 .00246 .00219 .00195 .00173 .00154 .00137 .00122 .00i08 
4 .05986 .05617 .05271 .04946 .04640 .04352 .04082 .03828 .03589 .03365 
5 .22371 .21529 .20717 .19933 .19176 .18445 .17741 .17061 .16406 .15774 
6 .43705 .42649 .41614 .40599 .39603 .38628 .37672 .36736 .35819 .34921 
7 .63029 .62030 .61040 .60057 .59083 .58119 .57163 .56217 .55280 .54354 
8 .77567 .76761 .75955 .75148 .74340 .73533 .72726 .71919 .71115 .70311 
9 .87316 .86736 .86150 .85557 .84958 .84355 .83746 .83133 .82516 .81895 
10 .93302 .92921 .92533 .92136 .91731 .91320 .90901 .90475 .90042 .89604 
ll .96695 .96466 .96230 .95986 .95735 .95476 .95211 .94938 .94659 .94373 
12 .98477  .98351 .98218 .98080 .97936 .97786 .97630 .97469 .97301 .97128 
13 .99345 .99281 .99212 .99140 .99063 .98983 .98898 .98809 .98716 .98619 
14 .99737 .99707 .99674 .99639 .99602 .99562 .99519 .99474 .99425 .99374 
15 99902 .99889 .99874 .99858 .99841 .99823 .99803 .99781 .99758 .99733 
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N 71 72 73 74 75 76 77 78 79 80 

c 

3 -00096 .00086 .00076 .00068 .00060 .00053 .00047 .00042 .00037 .00033 

4 03155 .02958 .02772 .02598 .02435 .02282 .02138 .02003 .01877 .01758 

5 15165 .14578 .14013 .13468 .12943 .12438 .11951 .11482 .11031 .10597 

6 -34043 .33183 .32342 .31519 .30714 .29928 .29159 .28407 .27672 .26955 

7 53437 .52531 .51635 .50750 .49875 .49011 .48158 .47316 .46485 .45664 

8 69510 .68712 .67916 .67123 .66333 .65546 .64764 .63985 .63211 .62441 

9 -81271 .80644 .80014 .79382 .78748 .78112 .77475 .76836 .76197 .75557 
10 -8915S .88709 .88253 .87792 .87326 .86856 .86381 .85902 .85419 .84932 
11 -94080 .93781 .93476 .93165 .92848 .92525 .92197 .91864 .91525 .91182 
12 96950 .96765 .96576 .96380 .96180 .95974 .95762 .95546 .95324 .95098 
13 -98518 .98412 .98302 .98187 .98069 .97946 .97819 .97687 .97552 .97412 
14 -99321 .99264 .99204 .99142 .99076 .99008 .98936 .98861 .98783 .98702 
15 -99707 .99678 .99648 .99616 .99582 .99546 .99508 .99468 .99426 .99382 
N 81 82 83 84 85 86 &7 88 89 90 

c 

3 -00030 .00026 .00023 .00021 .00018 .00016 .00015 .00013 .00011 .00010 

4 -01647 .01542 .01444 .01353 .01267 .01186 .01110 .01040 .00973 .00911 

5 -10178 .09776 .09389 .09017 .08659 .08314 .07983 .07664 .07357 .07063 
6 -26253  .25569 .24900 .24247 .23609 .22986 .22379 .21786 .21207 .20643 

7 -44855 .44056 .43269 .42493 .41727 .40973 .40229 .39497 .38775 .38064 

8 -61675 .60914 .60159 .59408 .58662 .57922 .57188 .56459 .55735 .55018 

9 -74917 .74276)=—- .73636)«=.72996 = .72356 =.71717_—-.71079 = .70442 »=—.69806 §=.69172 
10 -84442 .83949 .83452 .82953 .82451 .81947 .81440 .80932 .80421 .79909 
11 -90833 .90480 .90123 .89761 .89395 .89025 .88651 .88273 .87892 .87507 
12 -94867 .94630 .94390 .94144 .93894 .93640 .93381 .93118 .92851 .92580 
13 -97268 .97119 .96967 .96811 .96650 .96486 .96317 .96145 .95969 .95789 
14 -98618 .98531 .98440 .98346 .98249 .98149 .98046 .97939 .97830 .97717 
15 -99336 .99287 .99237 .99184 .99129 .99071 .99011 .98949 .98884 .98818 
N 91 92 93 94 95 96 97 98 99 100 

c 

1 

2 

3 

4 

5 

6 

7 

8 

9 
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TABLE 2 


95%-POINTS en, .»o AND 99%-POINTS ey, .o9 
FOR KOLMOGOROV’S STATISTIC 

















(1) (2) (3) (4) (5) (6) (7) 
N EN, .95 EN, .99 EN, .95 én, .99 ee ae 
EN. .% €N, .99 
2 .8419 .9293 .9612 1.1509 1.142 1.238 
3 .7076 .8290 .7841 .9397 1.108 1.134 
4 .6239 7341 .6791 .8138 1.088 1.109 
5 -5633 .6685 .6074 7279 1.078 1.089 
10 .4087 .4864 4295 .5147 1.051 1.058 
15 3075 .4042 .3507 .4202 1.039 1.040 
20 . 2939 3524 .3037 .3639 1.033 1.033 
25 - 2639 .3165 .2716 .3255 1.029 1.028 
30 .2417 . 2898 . 2480 .2972 1.026 1.025 
40 .2101 2521 .2147 2574 1.022 1.021 
50 .1884 . 2260 .1921 . 2302 1.019 1.018 
60 fas . 2067 1753 .2101 1.018 1.016 
70 .1597 .1917 .1623 1945 1.016 1.015 
80 .1496 .1795 1518 .1820 1.015 1.014 
90 .1412 .1432 1.014 
100 .1340 .1358 1.013 
4, Examples 


4.1. Determination of sample size needed. 


4.11. We wish to approximate F(z) empirically by F(x) so that the 
error is everywhere less than .15, on the 90% probability level. How 
large must be the sample size N? To answer this question, we find by 
interpolation in Table 1 that P{ Des <.15} > .900, so that N =65 is suffi- 
cient. 

4.12. An approximation to F(z) by Fy(zx) is desired on the 99% 
probability level with an error less than .05 everywhere; what sample 
size is needed? An inspection of Table 1 shows that N must be >100, 
hence the asymptotic formula (1.1) will be used. The asymptotic 99% 
point, according to Section 3, is 1.6276-N-?, hence by setting this 
equal to .05 and solving for N we find N = 1060. 


4.2. Estimating probabilities. 
In Table 3, column (2) contains an ordered sample of a random 
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variable X, consisting of the values X;,7=1, 2, - - - , 40. The values in 
columns (3) and (4) are 


i 
L(t) = ¢{0,— — .2101 
(7) = max ( rr ) 
and 
t 
U(ti) = min { 1, — + .2101 
(t) = min ( rm + ) 


for i1=0, 1, 2, - - - , 40, where .2101 is the value of &o, .95 from Table 2. 
It can be asserted with probability .95 that the true continuous prob- 


ability distribution function is everywhere contained in the “confidence 
band” defined by 


(4.2) L(i) < F(z) < U(i) for X; Sz S Xia. 


Therefore, any number of statements of the following kinds may be 
made simultaneously on a probability level of at least .95: P| X <.7867} 
=P{X<Xy} is a number between .1149 and .535!1; P{.7867<X 
<1.5137} =P{Xuy<X<Xy} is a number between L(34)—U(14) 
=.0798 and U(34) —L(14) =.8601; P{ X>1.5677} =1—P{ X < X37} is 
less than 1—1,(37) =.2851. Each of these statements separately could 
be made on a probability level higher than .95. 


4.3. Testing a completely specified hypothesis. 


We wish to test on the .95 probability level the hypothesis Ho that 
the sample in column (2) of Table 4.2 above was obtained from a nor- 
mal population with expectation 1 and standard deviation 1/+/6; we 
agree to reject Ho if the probability function 


J/6 z 1 V6(2—1) 

(4.31) F(x) = ~—f e1/2 O(X—-D2G@X = —f e-“/2du 
J 2 J —« V2n J 2 

is not entirely contained in the confidence band (4.2). 

For this purpose we may use the graphical procedure, in which the 
confidence band (4.2) is plotted, then a large number of values of 
F(x) are computed from (4.31) and a graph of F(x) is sketched, and 
finally Ho is rejected when this graph reaches or crosses the lower or 
upper boundary of the confidence band. The obvious disadvantage of 


this procedure is that it requires the computation of many values of 
F o(z). 
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TABLE 3 


DATA FOR EXAMPLES IN SECTIONS 4.2 AND 4.3 
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(1) (2) (3) (4) 
7 Xs L(t) U(i) 
1 -0475 -0000 .2351 
2 -2153 -0C00 .2601 
3 - 2287 .0000 -2851 
4 . 2824 -0000 -3101 
5 .3743 -0000 .3351 
6 . 3868 -0000 .3601 
7 4421 -0000 .3851 
8 .5033 -0000 -4101 
9 -5945 .0149 4351 
10 .6004 .0399 -4601 
ll -6255 -0649 -4851 
12 -6331 .0899 -5101 
13 .6478 .1149 .5351 
14 - 7867 . 1399 -5601 
15 .8878 - 1649 -5851 
16 .8930 - 1899 -6101 
17 -9335 -2149 -6351 
18 -9602 -2399 -6601 
19 1.0448 -2649 -6851 
20 1.0556 . 2899 -7101 
21 1.0894 .3149 7351 
22 1.0999 .3399 .7601 
23 1.1765 .3649 - 7851 
24 1.2036 -3899 -8101 
25 1.2344 -4149 .8351 
26 1.2543 .4399 .8601 
27 1.2712 -4649 -8851 
28 1.3507 .4899 -9101 
29 1.3515 .5149 -9351 
30 1.3528 -5399 -9601 
31 1.3774 -5649 -9851 
32 1.4209 .5899 1.0000 
33 1.4304 .6149 1.0000 
34 1.5137 .6399 1.0000 
35 1.5288 .6649 1.0000 
36 1.5291 .6899 1.0000 
37 1.5677 -7149 1.0000 
38 1.7238 .7399 1.0000 
39 1.7919 . 7649 1.0000 
40 1.8794 -7899 1.0000 
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Another procedure is based on the fact that Fo(xz) can leave the con- 
fidence band (4.2) if and only if it leaves this confidence band at one 
of the sample points X;,i=1, ---, N, that is if at least one of the in- 
equalities 


(4.32) L(t) < Fo(Xi) < Ut — 1), @#=1,2,---,N 


is violated. It would, therefore, be sufficient to compute Fo(X;,) for all 
sample points X;, and to reject Ho if at least one of the inequalities 
(4.32) is not satisfied. Even this procedure has the disadvantage that 
it may require the computation of all the L(t), U(i—1) and Fo(X;). 

Compared with the preceding two, the following method saves a 
considerable amount of computation: 

We consider the sample values ordered increasingly, as in column (2) 
of Table 4.2, and compute 


L(1) = .0000, = Fo(X:) = .0098, = U(0) = .2101. 


Since these three numbers satisfy (4.32), the smallest X; for which 
(4.32) could be violated must be such that either L(i)=Fo(X,) or 
F,(X;) = U(1), that is either? /40 — .2101 = .0098 or Fo(X,;) = 1/40+.2101, 
hence either 128.796 or X;2.7052; this means that either 729 or, 
according to column (2) of Table 4.2, 72 14 is the earliest sample value 
to check for (4.32). We compute for 7=9: 


L(9) = .0149,  Fo(X») = .1608,  U(8) = 4101. 


Since these three numbers satisfy (4.32), the next smallest X; for which 
(4.32) could be false must be such that either L(t) =Fo(X 9) or Fo(X;) 
= U(9), that is either 7/40 — .2101 2.1603 or Fo(X,;) 29/40+.2101, hence 
either 72 14.82 or X;=.9052; this means either 1215 or, according to 
column (2), 7217. We therefore compute for 7=15 


L(15) = .1649, Fo(Xis) = 3918, U(14) = .5601 


and note that (4.32) is verified. 

The next smallest X; for which (4.32) could be false must be such 
that either L(7) = Fo(X1s) =.3918 or Fo(X,;) = U(15) =.5851, that is 
4224.08 or X;21.0877, hence 1225 or 1221. We compute for 7=21 


L(21) = .3149, F((X2) = .5867, U(20) = .7101, 


and see that (4.32) is verified. 
Continuing this procedure, we finish up by calculating only the 
values 
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L(i) U(i—1) Fo(X«) U(s) 





-0000 -2101 -0098 -2351 
.0149 -4101 - 1603 -4351 
- 1649 -5601 .3918 -5851 
.3149 -7101 .5867 7351 
-4649 .8601 .7468 -8851 
.6399 1.0000 -8958 1.0000 


and do not reject Ho since (4.32) is satisfied for all these 7. If at some 
step of this procedure (4.32) had not been satisfied, we would have 
rejected Hy and stopped computing. This method appears particularly 
useful for large samples. 


5. Other distribution-free statistics 


5.1. A number of distribution-free statistics have been studied which 
lend themselves for treating problems such as those illustrated in the 
preceding section. Without attempting an enumeration of such sta- 
tistics and the techniques based on them, we should like to mention 
some of the more important among them and compare them briefly 
with Kolmogorov’s statistic Dy. 


5.2. The Chi-square. 


This well-known and extensively tabulated statistic is being used for 
testing completely specified hypotheses such as the one exemplified in 
4.3. The x? statistic becomes approximately distribution-free for N—> 
but is not distribution-free for finite N, and little is known about the 
manner in which its actual distribution for finite N and given F(z) is 
approximated by its limiting distribution. By contrast, Dy is a dis- 
tribution-free statistic for finite N and its exact probability distribution 
is tabulated for finite N (Table 1 of this paper) and for the asymptotic 
case [2]. 

Not enough is known about the power of either test to justify the 
preference for using the x? or Dy for testing a completely specified hy- 
pothesis. The x? technique, however, requires grouping of data, while 
in applying Dy one uses the individual observations; this suggests that 
the Dy test may utilize the information better than the x? test. 

The x? statistic has the advantage that it can be used for testing the 
composite hypothesis that F(z) belongs to a parametric family of dis- 
tributions. This is due to the fact that under fairly general assumptions 
it is known how the probability distribution of x? is approximately 
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affected when parameters are estimated from the sample (loss of one 
degree of freedom for each parameter estimated). No such knowledge is 
available for Dy. 

The statistic Dy can be used for estimating an unknown F(z) by a 
confidence band as illustrated in 4.2. Confidence regions obtained by 
using the x? have no simple intuitive meaning. 


5.3. Confidence bands with variable width. 


Wald and Wolfowitz [8] have developed a theory of distribution-free 
confidence bands more general than those defined by Dy. These con- 
fidence bands could, in particular, be constructed so that their width 
decreases towards the lower and the upper end of the distribution, 
which would be an improvement on Dy. Numerical tabulations, how- 
ever, are not available for this theory, either for finite sample sizes or 
for the asymptotic case. 


5.4. One-sided confidence bands. 


A one-sided confidence band was proposed by Smirnov [9] who also 
gave an asymptotic expression for the corresponding probability dis- 
tribution. The exact probability distribution for finite sample size N 
was derived by Wald and Wolfowitz [8]. An alternative expression for 
the exact probability distribution was proposed by Birnbaum and 
Tingey [10] and was used to tabulate the 10%, 5%, 1% and .1% points 
for N =5, 8, 10, 20, 40, 50. Since for N = 50 Smirnov’s asymptotic ex- 
pression is already very good, the probability distribution for one-sided 
confidence bands is at present tabulated well enough for practical use. 
It can be used for a one-sided test of a completely specified hypothesis 
or for estimation of an unknown F(z) by a one-sided confidence con- 
tour. 


5.5. Smirnov’s statistic. 


Modifying a statistic proposed by Cramér and von Mises, Smirnov 
[11] introduced the distribution-free statistic 


+00 
o,? = f [F.(x) — F(x) ]*dF(z) 
and derived an asymptotic expression for its probability distribution. 
This statistic could be used for testing completely specified hypotheses. 
No tabulation of its probability distribution is available.* 





* At the time of the printing of this paper, a table of the limiting distribution of nw,? was pub- 
lished in T. W. Anderson and D. A. Darling, “Asymptotic Theory of Certain ‘Goodness of Fit’ Criteria 
Based on £‘ochastic Processes,” Annals of Mathematical Statistics, 23 (1952), 193-212. 
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5.6. Sherman’s statistic. 


- The distribution-free statistic 


n+1 1 
_=— F(X;) — F(X) - —— 
a 2 uu (X;) ( 1) a+ 
where Xo= — ©, Xn41=+ ©, was introduced and studied by Sherman 


[12]. He derived its exact probability distribution for finite sample size 
n, and showed that this distribution is asymptoticaily normal. No tabu- 
lation is available for finite sample size. For large samples Sherman’s 
statistic can be used to test completely specified hypotheses. The calcu- 
lation of w, appears more time-taking than the use of Dy illustrated in 
4.3. Not enough is known about the power of either test to justify a 
preference for a test based on Kolmogorov’s or on Sherman’s statistic. 
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APPENDIX 
A 1. Kolmogorov’s formulas 


The following recursion formulas for computing Prob {Dy <c/N} 





438 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1952 


are those given by Kolmogorov in [1] except for minor changes in nota- 
tion: 

A ae ee 

(A 1.1) ro { n< W = wr? o,v(C), 


where F;x(c) is defined for all integers 7, all non-negative integers k, 
and c=1, 2, ---,N, and 


(A 1.2) Ro o(c) = 1, R;.0(c) = 0 fori +0 
(A 1.3) R;(c) = 0 for | i| 2c 


2r—1 1 


(A114) Rizu(c) =e > Risr—ex() for | i] Sc —1. 
s=0 § 


The change of notations for passing from (A 1.1)-(A 1.4) to Massey’s 
formulas in [6] may be summarized in the following “dictionary”: 
(A 1.1)-(A 1.4) Massey 
c k 
N n 
k m 
c+i j 
i+tc+l1-—s h 
a ja-k 
8 j—-h+1 
e*R; x(c) U;(m) 


A 2. Truncation and truncation error 


In the following all derivations are carried out for c fixed; the argu- 
ment c will, therefore, be omitted. 

We “truncate” the right-hand sums in (A 1.4) by retaining only the 
terms for s=0, 1, ---,7, where r<2c—1, so that the R;x(c) are re- 
piaced by quantities S;,. defined by the recursive formulas 


(A 2.1) So,0 = ) Si,0 = 0 for 7~ 0 
(A 2.2) Siz = 0 for | i] 2c 
1 


id 
(A23) Sian =e D> Sittea for | i| Se —1. 
s—0 § 
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The resulting “truncation error” R;,.—S;,, satisfies the inequality 


r 1 k 
(A 2.4) 0S Rw—Sie S1- ( -) = M,. 


veal v ! 


This inequality follows by induction from (A 1.4), (A 2.3) and the easily 
(again by induction) verified fact that OS R;x(c) $1. 

Example: for k<100, r=12, inequality (A 2.4) yields the upper 
bound for the truncation error: M;,<k-10-!°< 10-8. 
A 3. Round-off error 


To perform the computations on a machine with a capacity of t 
decimal digits, we introduce auxiliary numbers r, ¢(s), and a, defined by 


310-* = 7, 


l t 
(A 3.01) — = ¢(s)10-“ + EZ, = ( >. a;10~') 10-“ + E,, 
s! ad 


j=l 
where 


| Z.] <10-“r and a, 21, 


(A 3.02) e+=a+E, where | E| <r. 


Whenever 0SuSl, 0SvSl, and u, v are t-digit numbers, uXv will 
denote the result of computing the product wv exactly and then round- 
ing off to ¢ digits after the decimal point, so that 


uXv=w+G, where | G| sr. 


Whenever 0<f <1, we will denote by {f} the result of rounding f off to 
t digits after the decimal point so that 


{f} =f+F, where | F| s+. 


We now calculate the numbers 7';,, defined by the recursive relation- 
ships 
(A 3.1) To,0 = a Ts,0 = 0 fori #0 
(A 3.2) Tin =0 for | i| 2c 


Tike = {c | Tiss + Tin 


(A 3.3) 


+ > {(Tist-en X 6(9)10-} |} , 
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For the “round-off error” S;,.—7;,. we have the inequality 


(A 3.4) | Sin — Tie] < BU tatatt+ +++ +a) = 
where 
a =a), ¢(s)10™ 


B 


esi E,| +e| E| +r{a( 2 10~ +r — 1) +1] 
s=n2 


and £,, u,, EZ, are defined by (A 3.01) and (A 3.02). Inequality (A 3.4) 
follows by induction from (A 2.3) and (A 3.3). 

Example: for r=12, t=10, one obtains from (A 3.4) the estimate 
uk <3.33k-10-"°, hence for k $ 100 the round-off error is always less than 
3.33 10-8. 

A 4. Computation of Table 2 


It is not difficult to show that the probability distribution of Dy is 


given by 
1 1/2N+v 3/2N+0 
P(D <spte)omif f 
i 2N 1/2N—v J 3/2N—v 


(A 4.1) 
(2N—1) /2N+0 
f g(t, Us,+++, Un)dun- ++ duduy 
(2N—1) /2N—v | 

for 0OSvS(2N—1)/2N,‘ where 

1 for OSwmSueS-:::-Suysl 

g(t, ay Sy un) _ 
0 elsewhere. 


For small values of N, (A 4.1) can be evaluated by quadrature. In 
particular one obtains 


1 
P(D. <= +0) = 


4 It is easily seen that P(Dy <u) =0 for 0 Su $1/2N, so that the case —(1/2N) Sv SO need not be 
considered. 


2(2v)? for OSvs 


ate m |e 


1 1 
on — for S98 
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6 (20)8 


1 
, a Sorte = 
P(p, <= +)- 
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1 

for OSvs— 
6 

1 2 
for — Svs— 
6 6 

. 3 3 
for — Svs— 
6 6 

3 5 
for — Svs—- 
6 6 


Similar expressions have been obtained for N =4 and 5. For larger N 
the evaluation of (A 4.1) soon seems to become prohibitive. 

For N =2, 3, 4, 5 the values of ev, .95 and ey, .99 given in Table 2 
were obtained by equating the polynomials obtained from (A 4.1) to 
.95 and .99, respectively, and solving the resulting algebraic equations 
of degree N. For N210 the tabulated values of ey, .9; and ew, .99 were 


obtained by inverse interpolation from Table 1. 











THE MULTIPLE-PARTIAL CORRELATION COEFFICIENT* 


Duper J. CowpEN 
University of North Carolina 


A partial correlation coefficient which is also a multiple cor- 
relation coefficient is discussed. Its relationship with other 
well-known coefficients is explained. Computational methods 
for computing the estimating equation and the correlation co- 
efficient are suggested. 


CONCEPT 


in common use: 

Simple—the linear correlation between the dependent variable and 
one independent variable; 

Multiple—the simple correlation between the dependent variable 
and an estimate of that variable obtained from a linear equation 
involving two or more independent variables; 

Partial—the simple correlation between the dependent variable and 
one independent variable after adjusting each for the effect of one 
or more other variables. 

This paper is concerned with an extension of the concepts involved in 
this family of coefficients to include a coefficient which is both a mul- 
tiple and a partial coefficient: 

Multiple-partial—the multiple correlation between the dependent 
variable and two or more independent variables, when the de- 
pendent variable and the independent variables have been ad- 
justed for the effect of one or more other variables. 


4 en coefficients of correlation involving similar interpretations are 


APPLICATIONS 


Economists frequently adjust two time series for trend before corre- 
lating them. This is usually done by expressing each variable as a 
deviation from, or percentage of, its trend. Instead of the latter pro- 
cedure it would frequently be appropriate to use logarithms of the two 
series, fit trends to the logarithms, and adjust by subtracting the loga- 





* The writer wishes to thank Professors Harold Hotelling, George E. Nicholson, and John H. Smith 
for critically reading the manuscript and offering valuable comments. Professor Hotelling indicated the 
method of computation which he had suggested in an unpublished paper (see note 5). Professor Smith 
called the writer's attention to some of the earlier references to the subject in the literature. Since the 
first draft of this paper was written (June, 1951), it has been learned that Professor C. Horace Hamilton, 
of the North Carolina State College of Agriculture and Engineering, has written an article entitled 
“Population Pressure and Other Factors Affecting Net Rural-Urban Migration,” in which the coefficient 
of multiple-partial correlation is used. This article appears in Social Forces, 30 (December, 1951), pp. 
209-15. The formula used is that attributed by the present writer to John H. Smith (see note 7.) 
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rithmic trend values from the logarithms of the original data. Thus, if 
Xo=log Yo, X2=log Ye, and z:= years numbered consecutively from 
the middle of the period, then the correlation between the adjusted 
variables is the partial correlation coefficient ro2.1. If now X3=log Ys, 
where Ys is another independent variable, the correlation between the 
adjusted dependent variable zo.; and the adjusted independent varia- 
bles 22.; and 23.; is the multiple-partial correlation coefficient 10:23) .1. 
The writer has recently experimented with a forecasting problem, in 
which there were two independent lagged variables, each adjusted for 
a second-degree polynomial trend. This resulted in the coefficient 
70(84).12- 

Problems, the results of which could be summarized by a multiple- 
partial correlation coefficient, are of course not limited to cases of ad- 
justment for trend. One might wish to correlate the weight of boys with 
height and hip girth, after adjustment for age; or grades in Economics 
with verbal ability and numerical ability, after adjustment for amounts 
of stucy; or price of a corporation stock with dividends per share and 
earnings per share, after adjustment for an index of business confi- 
dence; or unit cost of production for a corporation with volume of pro- 
duction and hourly earnings of labor, after adjusting for economic 
plant capacity of the corporation and output per man hour for the 
entire industry. Samuel A. Stouffer! has correlated juvenile delinquency 
rate with log. of unemployment rate and log. of dependency rate, after 
adjusting for percentage of native whites. C. Horace Hamilton corre- 
lated net migration with an index of population pressure, change in 
cotton acreage, change in tobacco acreage, and change in other crop 
acreage, after adjusting for county size; also population pressure with 
the three acreages after adjusting for population pressure and ccunty 
size.? 

NOTATION 


In this paper the subscript 0 is used to indicate the dependent varia- 
ble. This subscript, rather than 1, is used so that in the convenient 
computation tables illustrated the last row and last column can be as- 
signed to the dependent variable without upsetting the natural number- 
ing system. Another method is to assign the largest numerical subscript 
to the dependent variable.* This method, however, requires renumber- 
ing the dependent variable in case the problem is enlarged to include 





1 Samuel A. Stouffer, “A Coefficient of ‘Combined Partial Correlation’ with an Example from So- 
ciological Data,” Journal of the American Statistical A iation, 29 (1934), pp. 70-71. 

?C. Horace Hamilton, op. cit. 

* See, for example, Paul 8S. Dwyer, “Recent Developments in Correlation Technique,” Journal of 
the American Statistical A iation, 37 (1942), pp. 441-60. 
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additional independent variables, or reduced by excluding non-signifi- 
cant variables. A subscript or subscripts after the 0 (if more than one 
they are enclosed by parentheses) indicates the dependent variable 
computed from an estimating equation involving that subscript or 
those subscripts. A subscript after a decimal point indicates a residual 
after subtracting the estimated value from the actual value of a varia- 
ble.‘ For the sake of uniformity symbols and formulas used by other 
writers will be presented in the above notation in this paper. 


ELUCIDATION OF CONCEPTS 


First consider coefficients of determination involving a minimum of 
variables. Symbols are defined as follows: 

x=X—X, a deviation from the mean of a variable. 

%o1=boit1, where boi = )_>20%1/ 2". 

T0.1=Zo— Zo1. 

Other “unexplained” deviations, such as 20.2, 21.2, 2.1, are similarly 


defined. 
Loc12) = bo1.2%1+bo2.122, bo1.2 and boe.1 being defined by the equations: 


>, tort = bore >, 2% + bo2.1 >, L122; 
>> 20%, = bor. >» %1%2 + bos.s > 2s. 


z. Z0.2%1.2 >. Z0.1%2.1 


bas = ——— and bea = ——— 
01.2 > aa an 02.1 , hs 


%o2.1 = Zoa2) — To1.- 


Note also that 


%0(23).1 = Toa23) — Zo1. 


Then: sal Dee | 
» 

roa) = 2 2a . 

Dz | 

egg = et 

z. 20.1 , 


>> 20@s).1 


7793). = —_— 
p v0.1 





4 The symbols in this paper are the same as, or an extension of, those used in F. E. Croxton and 
D. J. Cowden, Practical Business Statistics, Second Edition (New York: Prentice-Hall, 1948). See es- 
pecially pp. 432-49. The symbol for the multiple-partial correlation coefficient is the same as that used 
by Hamilton except that he uses R instead of r. (See Hamilton, op. cit.) 
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In each case the coefficient of determination is the ratio of the amount 
of variation which has been explained by use of the estimating equation 
to the amount of variation sought to be explained. The last of the four 
coefficients is the multiple-partial coefficient. It tells the proportion of 
variation in variable 0 that has been explained by variables 2 and 3, 
after adjusting variables 0, 2, and 3 for the effect of variable 1. 

Now consider a slightly more complex coefficient, r?o,34).12. The follow- 
ing hypothetical data are given so that the reader can check the pro- 
cedure if he finds it helpful. 


7 uv Z3 % Zo 
—3 —3 —3 —3 —4 
—2 0 —2 -1 —3 
—1 —2 1 1 l 
0 2 —1 2 2 
1 —1 0 —2 0 
2 2 3 -1 3 
3 2 2 4 1 


For the sake of simplicity of exposition, all observations are shown as 
deviations from their respective means. Variable 1 appears to be a 
function of time. We may first make estimates of the dependent varia- 
ble yo from the independent variables y3 and ys, where: 


Yo = Zou = Zo — Zoaz); 
Ys = T312 = Le — Taz); 
Ya = TMh.12 = %q — T12). 
Using these y values we obtain the following results: 
Bos = .8925; 


Boss = .0881; 
Yoru) = .8925y3 + .088ly, 
. 5.8790 
To) = 2 vow = = .3958. 





Sy? «14.8537 
r°o:34) for the y values is also 7?o;34).12 for the x values. 


HISTORICAL SUMMARY 


Perhaps the first mention of such a coefficient which appears in the 
literature was by Harold Hotelling,’ in a paper presented at the 47th 
regular meeting of the San Francisco section of the American Mathe- 
matical Society at the University of California on October 31, 1925. 





§ See abstract in Bulletin of American Mathematical Society, 32 (1926), p. 98. 
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The title of the paper, of which only an abstract has been published, 

was “Mixed Multiple Partial Correlation.” Hotelling suggested a 

method of computation based on determinants. For the simplest case, 
Aa Aoo.u — Aa 


770@3).1 =l1— = ’ 
Aoo.1 Aoo.1 
































where 
lfes.a Fos. 


Ai =| fos. 1 fesa 





| Too Te. 1 


and Aoo.1 is the determinant of the matrix obtained by deleting row 0 
and column 0 from the r;; matrix; i.e., 


Ls fe3.1 
Aoo.u = 








T3.1 1 


This method can be expanded by using second-order or higher coeffi- 
cients in the matrix, or by increasing the number of columns and rows 
to include more independent variables other than those for which ad- 
justment is made, or by doing both. 

In 1934 Samuel A. Stouffer suggested a “combined partial correla- 
tion” coefficient® and gave the following formula, which is an extension 
of a familiar multiple correlation formula: 


r7o@3).1 = 1 — (1 — roea)(1 — 1703.12). 


As Stouffer indicated, this expression can readily be expanded to take 
care of more complex problems. For example, 


779 ¢345).12 = 1 — (1 — r03.12)(1 — r%o4.123)(1 — 105.1234). 


He also suggested, apparently for problems where there is only one 
“active” independent variable, 


702.1703... — 2ro2.1703.1723.1 





T7603). = 
— 1 — ro34 
Except for notation, this is a particular application of Hotelling’s 


formula. 
In 1942, John H. Smith,’ in a review of Mordecai Ezekiel, Methods 





* Samuel A. Stouffer, op. cit. 
7 See Journal of the American Statistical Association, 37 (1942), pp. 398-99. 
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of Correlation Analysis (Second Edition), states the test of significance 
for the function 
ro 22) — To1 


770 23).1 = 





1 — ro 


(Tests of significance are discussed in a section of that title in the pres- 
ent article.) Smith did not give a name, designate a symbol, nor ascribe 
a meaning to this function. The above formula is an extension of a partial 
correlation coefficient sometimes used, * and is likewise readily extended: 





, 779 (12345) — T7012) 
70 (345) .12 = 1 2 . 
— T0125 


EFFICIENT COMPUTATIONAL PROCEDURES 


The procedure illustrated under Elucidation of Concepts is straight 
forward and orthodox; but it is excessively laborious. Three prelimi- 
nary sets of normal equations must be solved, three sets of estimated 
values computed from these estimating equations, and three sets of 
residuals obtained by subtractions; finally a fourth set of > >2;z; values 
must be computed, and a fourth set of normal equations solved. Some 
of this labor is unnecessary, since all of the needed values can be ob- 
tained by solving one set of five (rather than four sets of three) normal 
equations. 

Abbreviated Doolittle method. The abbreviated Doolittle solution® of 
the five normal equations is shown in Table 1. The variables for which 
adjustment is to be made must be at the top and left. The values 
(shown in italics) in columns (3), (4), (0), [3], [4], [0] for rows III, III’, 
IV, IV’, O, O’ are identical (except for rounding error) with the corre- 
sponding cells for the solution (not shown here) of the three normal 
equations involving the adjusted, or y, values. In particular, it is worth 
while to notice the following equalities: 


Symbol for coefficient Symbol for coefficient 


for unadjusted, for adjusted, ae 
or x values or y values 
Dos.12 Ba -90121 
Dos.i2 Boss .89251 
Dos.i23 Bus -08805 





® See Mordecai Ezekiel, Methods of Correlation Analysis, Second Edition (New York: John Wiley 
and Sons, 1941), p. 214. 

9 See Dudley J. Cowden, “Correlation Concepts and the Doolittle Solution,” Journal of the Ameri- 
can Statistical Association, 38 (1943), pp. 327-34. 
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The coefficient of multiple-partial correlation is based on column (0) 
of the Doolittle table. 

Column (0) Product Culumative sum of products 

> zor = 26 


b 92857 ~ 201 = 24.1429 > 2791 = 24,1429 
a = . 


> 2oit21 = 3.42857 


b 29268 Di to. = 1.0035 > zou2) = 25.1463 
e3.1 = e 


bos.is = .90121 


Dd Zo.rseFe12 = 1.86135 


i = 6.3414 
yx Zo.1273.12 6.3 | - z%es.12 = 5.7150 yw 20122) = 30.8614 
bos.i23 = ‘cesos | 


Dd zouies = .1639 >> ross) = 31.0253 


Dd zor = 8.9747 D> zoiae = 8.9747 > 2% = 40.0000 
ro(34).12 = 2 Baw Hocesy.t8 
; > 20.12 
} Zoci234) — 7 2912) 31.0253 — 25.1463 5.8790 
> 2% — dD. 20s) 40.0000 — 25.1463 14.8537 








és Dd 2Xos.12 + D. Zou.128 __ 5.7150 + .1639 __ 5.8789 
> 20.12 14.8357 14.8537 
= .3958. 


1o(34).12=.629. Being essentially a multiple correlation coefficient no 
sign is attached to it. This is the same value that was obtained as 
rosa) for the y values,” since ).2%o,s4.12= >.Y%ocs and >,2%.12= > Y*o 
The Doolittle table shows: 








bo. = .92857; 
ber = .71429; 
bor.r = .29268. 


bo1.2 is easily obtained from a relationship among these three regression 
coefficients: 


bor.2 = bor — beibor.s 
= .92857 — (.71429)(.29268) = .71950. 
The estimating equation is 
Losay.2 = (Dor.2se — Bor.2)%1 + (bo2.134 — bov.1)%2 + Bos.124%s + bos.123% 
= (—.18574 — .71950)ax, + (.40977 — .29268)z2 
+ .892522; + .088052, 


10 Also res. for the z values =70,« for the y values and ro.12: for the z values =re,s for the y values. 
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= — 905242, + .117092z2 + .892512z3 + .08805z,. 


Using this estimating equation we obtain (except for rounding error) 
the same values for 20:24) .12 that we do for yov34). 

In the abbreviated Doolittle solution the uncoded measures of varia- 
tion >>2z,2; were used. When the values in the original matrix differ 
greatly in size, it is desirable to code the data in order to avoid the 
necessity of using a large number of decimal places and/or digits in the 
solution. The writer customarily codes by multiplying each variable by 
that power of 10 which makes the different entries in the original matrix 
as close to 1 as possible. This can be done either before or after comput- 
ing the > 2,2; values. Decoding of the coefficients in the estimating 
equation is then easily accomplished by the shifting of decimal points. 
Many persons prefer to code by using simple correlation coefficients 
rij instead of the >>2,z; values." The advantage of such a procedure is 
that on the basis of these correlation coefficients a preliminary estimate 
can be made of the relative importance of the independent variables so 
that those variables can be arranged from left to right in the Doolittle 
table according to their approximate relative importance. This permits 
the number of variables to be increased until the results are satis- 
factory, or non-significant variables to be dropped out, with a minimum 
of recomputation. The disadvantage of using the r’s is the labor in- 
volved in their computation and also in converting the f’s yielded by 
the table into b’s. 

Square-root method. The square-root method” provides a more com- 
pact solution, for each variable has only one row instead of two. Al- 
though the square-root method can be used conveniently with the 
xix; matrix, we shall illustrate its use for our data with the rj; 
matrix. Computations are in Table 2. 

The 8 coefficients may be obtained by multiplying each of the items 
in row o (except the first and last) by the first item in that row, i.e., 
by™ ro.1234=V 1 —Toc234)? = 47368. (Actually they were obtained by 
recording in the 8 row the cumulative sums of products that were di- 
vided by 70.1234 to obtain the values recorded to the right of the vertical 
double line in row o.) They are converted into b coefficients by multi- 





ui This is equivalent | to coding the original variables by subtracting the mean from each observation 
and dividing by J/ >. 

12 See Paul 8. Dwyer, “The Square Root Method and Its Use in Correlation and Regression,” 
Journal of the American Statistical Association, 40 (1945), pp. 493-503. 

13 Just as the coefficient of multiple correlation ro.) may be regarded as the simple correlation be- 
tween 2 and Za12), 80 the coefficient of multiple alienation re,124 may be regarded as the simple correla- 
tion between 2 and 2+,13%. 
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plying by the ratio 80/sjV >>2%»/ >>2?;. Note the agreement of these 
values with the values recorded in the next to the last row of Table 1. 
The coefficient of multiple-partial correlation is obtained by a pro- 
cedure analogous to that used in connection with Table 1. The values 
recorded in column (0) and their cumulative sums of squares are 


Square of Cumulative Sum 
Row Entry Entry of Squares 
i Bu =.77690 -60357 ro = .6035 
ii Ti.2 Boo.1 = .15839 .02509 770.12) = ,62866 
iii T3.12 Bos.12 = .37798 ° 14287 720123) oF .77153 
iv 74.1238 04.123 = .06401 -00410 T* o1284) = .77563 
0 T 0.1243 = .47368 . 22437 ro =1.00000 


The coefficient of multiple-partial correlation is 
Toa24) — Taz, ——--77563 — .62866 —.14697 
1—roa2) -—«:1.00000 — .62866 37134 





= .3958. 


ro (34).12 = 


or we may compute 
73 128703.12 + 174.1238" 04.123 
1- ro a2) 
.14287 — .00410 .14697 
37134 37134 


770 (34). = 








= .3958. 


It should be noted that, using the square-root method or the abbrevi- 
ated Doolittle method illustrated earlier, the following multiple-partial 
coefficients (with Variable 0 dependent) are obtained with equal ease: 
10(28).15 70(234).15 70(84).12- 


TESTS OF SIGNIFICANCE 


The multiple-partial correlation coefficient may be tested for signifi- 
cance by use of analysis of variance. An appropriate hypothesis is 
p?0(34).12=0, where p refers to a population correlation coefficient, and 
the alternative hypothesis is po;34).120. It is assumed that the values 
of the independent variables Xi, X2, X3, +--+ are “fixed.” The equation 
for estimating values of Xo,123...) has m coefficients, including the con- 
stant a. There are p “passive” variables, for which the other variables 
are adjusted. They are assigned the lower of the numerical subscripts 
for the independent variables (for convenience in compact methods of 
solution) and are preceded by a decimal point in the symbol for the 
coefficient. There are g “active” independent variables, enclosed by 
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parentheses. m=p+q+1. Table 3 shows the basic data for testing our 
coefficient. 











TABLE 3 
COMPUTATION OF VALUES FOR TEST OF SIGNIFICANCE OF rocuy.is 

Source of Amount of Degrees of Estimate of 

Variation Variation Freedom Variance 
Ws 4. che keeaecewecaeenaseeeeaens =z, = 40.0000 N-1=6 
All independent variables............ Tr%e(124) = 31.0253 m—-1=4 
“Passive” variables......... err er Tri) = 25.1463 p=2 ee 
RANE” CH. 06.0606 sscivescees Txr*%0(u).12= 5.8790 q@=2 2.940 
II 6.5.64 6h.c5 008 5s050000004 Tr%e10 = 8.9747 N-—m=2 4.487 














Source: Table 1. 
> 52% 34) .12 +@q ” 2.940 
>>2%0.12 + (N —m) 3.487 
If preferred, F may be computed from the coefficient of determination. 
770 ¢34).12 + 3958 + 2 
ryoun+(N—m) 6042+2 ° 


The F table is entered with degrees of freedom »1=g and »»>=N—m. 
Since F <1 in the present instance, we conclude that the coefficient is 
not significantly different from zero. (Actually, the test of significance 
is in this case only illustrative of procedure, since the data are hypo- 
thetical.) 

If we wish to test the hypothesis that p?o;345).12—p’oz.12=0, 


7 [do 275 (346) .12 — > 203.12 | + (q2 — %) 
Dd 20.208 + (N — m) 
a [ >> xo 2s) — >, Zoa2)] + 2 ; 
> 20.1205 + (N — 6) 


The test of hypothesis that p*o,34).12—%o3.12=0 is the same as the 
test of the hypothesis that p’o4.12s=0. When g=1 the multiple-partial 
correlation coefficient becomes the partial correlation coefficient. When 
testing the partial correlation coefficient the usual t-test can be applied. 
The sign of ros.i23 may be of interest, and a one-sided test may be 
applied; for example, we may test the hypothesis that roa12:330, the 
alternative hypothesis being that ro«.123>0. If we wish to test the hy- 
pothesis that ro4.123 is equal to or less than some value other than zero, 
Fisher’s z transformation can be used. 





Fe = .655. 





F= 655. 





F 
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ANALOGY WITH ANALYSIS OF MULTIPLE COVARIANCE 


A set of simple examples will serve to clarify the analogy. Assume 
first that we have selected at random 48 regions, each with approxi- 
mately the same number of families (which we will refer to as “coun- 
ties”), from k states, say Wyoming, South Dakota, and Nebraska. We 
collect data with respect to the following variables: 

0. Family expenditure on food as a per cent of total expenditure 

4. Family income 

5. Family size 
The latitude, longitude, and altitude of each county is also recorded. 
Use will be made of these variables later in this section. 

First let us classify the data into k sections or “columns,” one for 
each state. There are n; observations in each section; ni+ne+ --:- 
-+n,=N, the total number of observations. We now compute the fol- 
lowing measures of variation or covariation for each of the k sections 


n n n 
p» xo b Tor, -» ols 
1 1 1 


n 
>. xr, p 4X5 
1 


1 


d 2s 
1 
where 2;= X;— Xj. 
Then we combine the data for the k sections, obtaining 


Nn, 


kon 
Dy 2% 
1 1 


nj Me | 


Dd. Zor > >> ors 


1 


: 
bbe 


~Me - 
2 
s 


Using these values we compute by orthodox methods the constants 
Boss and Bos.4. These may be thought of as average regression coefficients 
for the average regression equation 20:45) = bos.st4tbos.45. An average 
coefficient of multiple correlation can also be computed, showing the 
multiple correlation of variable 0 with variables 4 and 5 after adjusting 
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for the effect of the state in which each county is located.“ The average 
coefficient of multiple determination is 7%o,45) = >_,2%0,45)/ >_2°0- 
The variance ratio for testing significance is 
> £7045) + (m — 1) 
DL fou + [(N — 1) — (K-11) — (m—1)] 
Since, for our example, N =48, k=3 and m=3, we may write 


ae > £0 (45) = 2 
Zz £70.45 + 43 


Let us now modify the procedure slightly so as to show the analogy 
between multiple covariance and the multiple-partial correlation co- 
efficient. For purposes of this illustration we shall introduce the follow- 
ing discrete variables, each of which can take only the value of 0 if 
absent or 1 if present: 





F= 


F 





1. Wyoming 
2. South Dakota 
3. Nebraska 


The multiple-partial correlation coefficient of determination can be 
computed as usual. 


: >> 2700s) .128 >> 2702308) — >, Zocss) 
70 (45).123 = = 
D> 20.123 > 20 — Dd 2x28) 
Alternatively it can be computed as >> y*oc4s)/ >.y%o, where yj; = Xj.12 


= X;— Xj,123). Solution of normal equations for the estimating equation 
X 5123) =A+bj1.23X 1+bj2.13X 2+bj3.12X2 gives 


bj1.23 = X; (Wyoming) — a; 
bje.13 = X; (South Dakota) — a; 
bjs.12 = X; (Nebraska) — a. 





The constant term in the estimating equation is zero because 24,123) 
must be bj1.23, bj2.13, or bj3.12. Therefore, each y value in column j is the 
same as the z; values referred to in describing multiple covariance, 
and ro45).123=the average multiple correlation coefficient 7045). The 
variance ratio for testing significance is 





\“ Hamilton computed coefficients of multiple, partial, and multiple-partial correlation after ad- 


justing for regional differences. (See Hamilton, op. cit.) He did not, however, explain his tests of sig- 
nificance, 
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7 > 270 45).123 + - >> 270 45).123 + 2 
D zou + ((N-—1)-—(p-—1)—q] D2 2.208 + 43 


Note that q has the same value as m—1 for the estimating equation 
Yoc4s) = Doa.s¥atb05.4¥s. Note also that p—1, rather than p, degrees of 
freedom are lost through use of the discrete passive variables, because, 
p—1 of the p:p regression coefficients are independent of the value of X;. 

Finally, let us replace the discrete passive variables with continuous 





F 


ones: 
1. Latitude 
2. Longitude 
3. Altitude 
For testing significance we use 
>, 2 oc45).123 + 9 m Dd 270 c45).123 + 2 


F 





ia > 2.1208 + [(N — 1) —p— ql] ia > 20.1205 + 42 


By using three continuous variables, each of which can take any value, 
rather than only zero or one, we lose one more degree of freedom. 








ESTIMATION IN THE TRUNCATED 
NORMAL DISTRIBUTION 


Max HALPERIN 
National Heart Institute, Bethesda, Maryland 


Charts are presented which can be used to simplify estima- 
tion of u and @ in the case of sampling from a singly truncated 
normal distribution when (a) the point of truncation and the 
number of observations in the truncated portion are known, 
(b) the number of observations in the truncated portion is not 
known. A somewhat different iteration procedure for case (a) 
than given by other writers is suggested and an example is 
given. 


1. Introduction 


N STATISTICAL applications one frequently encounters sampling situa- 

tions of what might be called the life-testing type. One example is 
the administration of a lethal drug to a group of animals. In such a case 
the measurement of interest is the time to death of each animal. For 
various reasons it may not be expedient to wait till all the animals die 
so that one terminates the sampling procedure after some known time 
has elapsed. We have then a distribution which has been truncated on 
the right and for which the number of sample observations in the 
truncated portion is known. Another example arises in the testing of 
airplane propellers in a wind tunnel. In this case the variable measured 
is the wind velocity required to rupture the propeller. Since the pro- 
pellers are expensive and represent a complete loss if ruptured, it may 
be desired to terminate the test procedure after a specified number of 
propellers have ruptured, simply from considerations of cost. It is ap- 
parent that this second example is quite analogous to the first, the sole 
difference in the latter case being that the point of truncation is ran- 
dom, whereas in the former case the point of truncation is not specified 
to be random. 

It is frequently reasonable to assume that the variable being meas- 
ured (or some simple function of it, the logarithm of the measurement, 
for example) is normally distributed with mean, u, and standard devia- 
tion, ¢. The problem of estimation of u and a for this case has recently 
been considered by Cohen [1] and Hald [3]. 

Cohen’s solution employs rough estimates of o and (T— ,)/o and 
proceeds to a final solution via a modified Newton-Raphson method. 
Hald’s paper gives tables of the maximum likelihood estimate of 
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(T—,)/o at intervals of .005 on an easily computed sample statistic, 
for values of p=r/n at intervals of .05, starting with p=.05. In terms 
of the sample statistic used in this paper, Vp, (see Section 2), Hald’s 
tables yield solutions for 4S V*,, $12. In this paper we present charts 
which embrace the much larger range on V?pn, 4S V7p, $74. Also, an 
iteration procedure somewhat different than that of (1) is suggested. 
Truncation is assumed to be on the right for this discussion, but the 
slight modifications necessary for estimation in the case of left trunca- 
tion are also pointed out. We assume, then, that we have n independent 
observations on 





oV/2r 2c? 


with truncation at a known point, 7. We suppose that r(<n) observa- 
tions, 71 - - - , 2;, all less than 7’, have been observed and the remaining 
n—r observations are known to be greater than 7’. 

Hald [3] has also considered estimation of » and o in a truncated 
distribution when the number in the truncated portion is not known. 
In this case, his paper gives tables of the maximum likelihood estimate 
of (T—,)/o for an interval on a simple sample statistic. In terms of 
the statistic used in this paper, V® (see Section 3), Hald’s tables give 
solutions for 4.4 S$ V?<7.3. This corresponds to —3.145 S(T —y/o $2.0. 
Cohen, in [2], gives a chart which yields solutions for 4.3 < V?$7.6, 
corresponding to —3.55(T—,)/o $3.5. We give here a chart (Figure 
5), which yields solutions for 4.15 V?$7.7, corresponding to —5S 
(T—)/o<5. We assume in this case that we have n independent ob- 
servations on 





P(z) = --——= exp — ————- 





where 





T — m 1 (T—n) |o 
( - ) = —_ f (exp — 27/2)dz. 


2. Derivation of Charts and Iteration Procedure when Number of Observa- 
tions in Truncated Portion is Known 


The sample likelihood for the first situation outlined in Section 1 is 
given by 
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P(x, +++ , Xr) : (- =) | exp — oe > (xz; — | 


~ (n—o)! a & 


1 L-) nT 
7, on _— 92 
[sz J i Z /2)de| ‘ 


If the substitution, n= 7'—ch, is made in (2.1), one gets upon taking the 
logarithm of both sides, 





(2.1) 


log P(a1, ++ > , 2) 
; 2 rh? —s rh 
= t-—rl — zz; —-T)? -—-—-—(@-—-T 
(2.2) cons rloga 3,8 Xu ( ) > 7 ( ) 
1 © 
+ (n — r) log =|. (exp — 2?/2)dz 
where 
1 r 
i; =— pm Zi. 
Tr j=l 


From (2.2) the maximum likelihood equations for estimation of h 
and o are found to be 











0 log P(a1,° ++ , 2) r i 
2.2 se ep 3- T) 
ees a6 7 tere 
rh 
+ —(z— T) =0, 
6? 
and 
2 log P(r: 7+, Ze) — r(T — Z,) 
(2.2b) oh é 
— (n — r)g(h) = 0, 
where 
: h?/2 
—= exp — 
J/2r 
g(h) = 





1 ¢° 
a J (exp — 2?/2)dz 
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Solving (2.2a) and (2.2b) for A, there results after simplification, 





(2.3) g(t) = ——">___ [(v2,, — 2)h — 2V/? + Vpn, 
(1 — D)V%pn 


= u(h, V2 50), 


where 


p= ts V0 = 4> (x5 _ T)?/r(T _ 2,)*. 


i=] 


In Figures 1, 2, 3, 4, & is plotted against V?,, for values of p at inter- 
vals of 0.1 beginning with p=0.1. Thus for given V?,, and p, we can 
read fh from one of our charts and compute ¢ from either 


(2.4a) &= a Et h+ Vh? + V2onl, 
or 
(2.4b) é = r(T — &,)/[rh + (n — r)g(h)], 


derived from (2.2a) and (2.2b) respectively. _ . 
lf we take derivatives with respect to h of g(h) and u(h, V7pn), we get 








—_ du(h, V2pn) —p [ws 2) 2h 
08 - = pn ” ak s oo 
— dh (1 — p)V25n a 
and 

dg(h) af i 7°. 

Se = 0) fe - fi exp -24/2)de 
(2.5b) - S 


1 C) 
=| (exp —z /2)de). 


The right hand side of (2.5a) is always negative since the bracket ex- 
pression can be shown to always be positive. The right hand side of 
(2.5b), on the other hand is clearly always positive. It follows that 
u(h, V2pn) is monotone decreasing in % and g(h) is monotone increasing 
in %. These two facts imply the uniqueness of our estimates and afford 
a convenient method of iteration for values of p not shown in our charts. 
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Thus we can get an estimate of h, say hi, by interpolation in one of the 
charts and insert this estimate in (2.2a). If g(hi)>u(hi, V2pn), then 
hi>h, while if g(h:)<u(hi, V%pn), hi<h. Using these latter facts as 
criteria, we can find % to any desired accuracy with very little labor. 
Notice that this calculation requires only tables of ordinates and areas 
of the unit normal distribution. 

For truncation on the left (2.2a) applies but gives as solution not h 
but —h. 

The other slight difference for this case is that our estimates of ¢ corre- 
sponding to (2.4a) and (2.4b) are 


r(T —_ Z,) 





(2.6a) é= on rs (n es r)g( —h) ’ 
and 
(T = Z,) > — 
(2.6b) on vn [—h — Vk? + V3,,], 
respectively. 


As an example we consider a situation in which truncation is on the 
right, n=22, r=13, V2p,.=25. Since p=.591, our first estimate of h is 
an interpolated one, say from Figure 3. Using linear interpolation we 
get h=—.114. Beginning with this approximation we subsequently get 
the results summarized in Table 1. 











TABLE 1 
(1) (2) (3) (4) 
h g(h) ulh, V2 pn) (2)-(3) 
—.114 . 7266 .7322 — .0056 
—.1l .7291 .7269 + .0022 
—.111 .7285 . 7282 + .0003 
—.1112 .7284 .7285 — .0001 





Since we have used a 4-place table of normal ordinates and areas, any 
further iteration than shown in Table 1 would likely give specious ac- 
curacy. 


3. Construction of Chart for Estimation When Number of Observations in 
Truncated Portion is not Known. 


The sample likelihood for this situation is given by 
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1 ed 1 " 
P(a, $0, Za) = ( ) exp 
T—u oV/2r 
(3.1) p{———- 
o 
> (x; — pu)? 
tend 20? 


If we put n= 7 —he in (3.1), take logarithms of both sides and differ- 
entiate with respect to o and h, we obtain as estimating equations, 





8 log P(21, +++, tn) n h 
3.2 = S am : i mt 
(3.20) on 7 tr@-Ns 


1 
a —T)?=0, 
+d ) 
and 
0 log P(m,--- Zn) — n(z — T) 


(3.2b) ah Ps 





— nh + ng(h) = 0, 





SE TOC eReeeU RETRO REET «| 





ESTIMATION IN THE TRUNCATED NORMAL DISTRIBUTION 


where 
1 exp —h?/2 
== - > zi,  g(h) = ° 


” f. (exp —2?/2)dz 
h 





From (3.2a) and (3.2b) one gets as an estimating equation for %, 


on 2 sia ——_———_. 
(3.3) gh) =h+ == (-—h+VJSh? + V%), 


where 


Vt= E = ts, ~ m| /@ ~ 7), 


% én 


Solutions of (3.3) are given by Figure 5. An estimate of ¢ can then be 
obtained from either ¢=[(—T)/2)(hA+V#+V2), or ¢=(2—T)/ 
[9(h) —hl, derived from (3.2a) and (3.2b) respectively. 

The charts presented here were constructed in connection with [4] 
while the author was on the staff of the USAF School of Aviation 


Medicine. 
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MULTISTATION INSPECTION SCHEMES* 


GERALD J. LIEBERMAN 
Stanford University 


1. INTRODUCTION 


N SAMPLING inspection by attributes, each item is classified simply as 

defective or non-defective, with several different quality character- 
istics ordinarily considered in making this classification. Very often, 
the types of possible defects are collected into groups and sampling 
plans are applied to each group separately. But even within a group 
there may be numerous equally important sources of defects. For ex- 
ample, in the manufacture of aircraft bolts, defects are classified as 
critical, major, minor A, and minor B. In the major group there are as 
many as 13 possible sources of defects, i.e., thread size and form, grip 
length, plating, etc. 

Characteristics also may be assigned to groups according to their 
position in the production process, e.g., in the manufacture of radio 
receivers natural inspection stations may arise during different stages 
of the process. In any case, for a given group of more or less equally 
important types of defects it may be desirable for the vendor or the 
consumer to inspect a lot for different types of defects during different 
stages of the production process. In the first place, lots that are certain 
to be eventually rejected should be caught early and subsequent fabri- 
cation costs saved. Lots of radio receivers that are “bad” should be 
rejected in the production process as early as possible. Secondly, manu- 
facturers are interested in the cause of trouble whenever any arises. If 
lots of aircraft bolts are rejected, with one of the thirteen characteristics 
mainly responsible (e.g., more than two imperfect threads), the manu- 
facturer wants to have this information. Finally, further operations in 
a process may hide defects. In the assembly of radio receivers, addi- 
tional operations could easily obscure a poorly soldered joint. Painting 
or plating could prevent identification of surface flaws. 

It often is necessary that, if characteristics are considered separately, 
the resulting protection measured in terms of the Operating Character- 
istic (OC) curve (i.e., for a given sampling procedure and incoming 
quality, the probability of accepting the lot) should be the same as that 
which would be achieved if all the characteristics were considered to- 
gether since, (1) the purchaser may be interested in over-all quality 
within each group, (2) suppliers, though interested in individual char- 





* Work done under the sponsorship of the Office of Naval Research. 
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acteristics, may be striving for an over-all degree of quality, and (3) 
two suppliers manufacturing the same product in parallel installations 
(one inspecting all characteristics together and one inspecting sepa- 
rately) should have the same quality requirements imposed on them. 

A set of sampling plans which are applied to individual characteris- 
tics of items in a given lot at several stations during the production 
process will be called a multistation inspection scheme. The term indi- 
vidual characteristic also applies to groups of characteristics when, for 
inspection purposes, such a group is considered as a single character- 
istic. Conceptually, every individual characteristic has a “station” as- 
sociated with it where inspection takes place for that characteristic 
only, although physically, all characteristics may be inspected at the 
same location. At each station, the lot may be rejected, and is finally 
accepted only if it is not rejected at any station. This paper considers 
the problem of finding a multistation inspection scheme whose OC 
curve (the probabilities of being accepted at all stations) matches the 
OC curve of a given (one station) sampling plan. The ensuing discussion 
is limited to single sampling plans. 

The desired protection is specified in terms of an OC curve based 
upon the over-all percentage defective. Each station has a percentage 
of defects which contributes to this total. The OC curve of the multi- 


station inspection scheme is a function of the individual percentage of 
defects at each station. Yet this OC curve should approximate the OC 
curve of the single station plan, regardless of the distribution of the 
given percentage defective among the stations. 


2. MULTISTATION INSPECTION PROCEDURE 


A multistation sampling procedure which provides the desired pro- 
tection as specified by the OC curve (sample size n and acceptance 
number a) of the one station plan is specified as follows: 

At each station, a random sample of n items is drawn from the lot. 
The lot is then accepted if the total number of defects for all stations 
does not exceed a. In other words, a lot is rejected as soon as the total 
number of defects for all the stations exceeds the acceptance number a. 
It should be emphasized that the acceptance number and sample size 
of the plan is the same as those of the original one station plan. 

If the probability of obtaining defects at the separate stations is as- 
sumed to be independent,! the resulting OC curve for this procedure is a 





1 If px is the percentage of defects at the ith station, the probability that a single randomly chosen 
item will not contain a defect in any characteristic is (1 —p:) (1—ps) * * * (1 —px) #0 that the total per- 
k 
centage defective is p=1— II (1—p;). 
i=1 
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function of the percentage of defects at each station. However, regard- 
less of the manner in which the defects are distributed, the OC curve 
of the above plan approximates the desired protection very closely. In 
fact, the probability of acceptance, given a total percentage defective 
(of all characteristics), is (a) never less than the OC curve computed 
for the multistation plan with equal percentage of defects at each sta- 
tion, and (b) never greater than the original OC curve, and of course is 
equal to it when all the percentage defective is concentrated at one 
station. Thus, if the distribution of the percentage defective is un- 
known, the OC curve of the multistation sampling plan is given by a 
band, the lower boundary being the OC curve for the plan when each 
station has an equal percentage of defects, and the upper boundary 
being the original desired protection (OC curve). 

Table 1 shows that the band is so narrow that for all practical pur- 
poses, the OC curve of the multistation plan is the same as that of the 
original (one station) curve. 

The one station plans that are being matched in Table 1 are plans 
taken from Military Standard 105A? with the percentage defective 
values tabulated such that their probabilities of acceptance were ap- 
proximately 95% and 10%. The corresponding values for the multi- 
station plans were calculated with each station having an equal per- 
centage of defects. Whereas values are given for a maximum of 5 sta- 
tions, the procedure could have been extended to any number of sta- 
tions with the same gratifying results. The value of the upper bound 
of the band (the original OC curve or the multistation OC curve with 
the percentage defective concentrated at one station) is given in the 
column headed Probability of Acceptance for MIL-STD-105A Plan. 

It appears from the table that a possible improvement of the solu- 
tion would be to increase the OC curve by increasing the total allow- 
able number of defects for all stations. Unfortunately, even an increase 
of 1 allowable defect would raise the lower bound of the band above the 
original desired OC curve to such an extent that it no longer would 
remain the excellent approximation that it was under the above pro- 
cedure. 

A multistation plan does exist where the actual original OC curve 
(n, a) can be obtained by modifying the sample size at each station ac- 
cording to the number of defects found at the previously inspected 
stations, i.e., the sample size at any station is t':e sample size (n) of the 
desired OC curve less the sum of the defects found at the previously 





2 This standard establishes sampling plana and procedures for inspection by attributes for use in the 
determination of acceptability of products procured by the government. In the remainder of this paper 
it will be referred to as “MIL-STD-105A.” 
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TABLE I 


MULTISTATION SAMPLING PROCEDURE 
At each station, a random sample of n items is drawn from the lot. The lot 
is then accepted if the total number of defects in all the stations does not exceed 
a, where a and n are the acceptance number and the sample size, respectively, 
of the given (one station) sampling plan from MIL-STD-105A that is being 
approximated. In other words, a lot is rejected as soon as the total number of 
defects for all the stations exceeds the acceptance number a. 


Examples of Bounds for the OC Curve of the Multistation Inspection Scheme 



































Prob. of Lower bound for the probability 
Per- accept- of acceptance of the Multi- 
oo centage | ance for station Inspection Schemef 
"Pie ~ Defec- | MIL-STD- 

_ tive 105A 2Sta- | 3Sta- | 4Sta- | 5 Sta- 

Plan* tions tions tions tions 

Letter E 2.5 .975 .974 .974 .973 .973 
AQL6.5% n= 10} 35 .086 | .078 .076 .075 .074 
Letter G 3 .962 .960 .959 .959 .959 
AQL4% n= 25; 20 .098 .091 .088 .087 .086 
Letter H 4 .950 .946 .945 .945 .944 
AQL4% n= 35| 18 .103 .093 .090 .089 .088 
Letter J 4 .969 .966 .965 -965 .965 
AQL4% n= 75| 13 .129 .116 112 -110 .109 
Letter K 4 .967 .964 -963 -962 -962 
AQL4% n#=110] 12 .077 .068 .065 .064 -063 
Letter L 4.5 .961 .956 .954 -953 -953 
AQL4% n=150] il .091 .079 .076 .074 .073 
Letter N 1.25 .963 -962 .962 -962 .962 
AQL1% n#=300 3.75 .123 .119 .118 -117 .117 
Letter P 1.3 .960 -959 .958 -958 .958 
AQL1% n#=750 2.75 .123 .119 .117 -117 115 























* This is also the upper bound for the probability of acceptance for the Multistation Inspection 
Scheme, which occurs when all the defective is concentrated at one station. 
+ This lower bound occurs with equal percentage of defects at each station. 


inspected stations.* As before, a lot is rejected as soon as the total num- 
ber of defects exceeds a. Changing the sample size at each station, 


* This result was communicated to the author by Jack Laderman, Office of Naval Research, Wash- 
ington, D. C. 
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according to the number of defects observed, requires somewhat more 
complicated instructions to the inspectors. It is questionable whether 
the change in protection is worthwhile in view of the narrowness of the 
OC band of the recommended plan. 

It is possible that the probability of obtaining a defect for each char- 
acteristic may be somewhat correlated, i.e., not independent. This will 
tend to make the resulting OC curve for the multistation plan stricter 
than in the case of independence. An extreme occurs when there is 
perfect correlation, i.e., if an item contains a defect in one character- 
istic, it contains a defect in every characteristic. The OC curve for this 
case lies well below the desired OC curve. 

From a practical standpoint, the advantages of the multistation 
sampling scheme developed in this paper are, (1) it is administratively 
simple, (2) it can be used in conjunction with MIL-STD-105A, and 
(3) it is essentially independent of the distribution of the total per- 
centage defective among the stations. 


3. DERIVATION OF RESULTS 


Let p be the over-all percentage defective in the lot. Then g=1—pis 
the over-all percentage of non-defectives in the lot. Similarly, define 
pi(t=1, 2, ---,k) asthe percentage of defects at the ith station. Then 
qi=1—p; is the percentage of non-defects at the 7th station. 

Let L(p) be the original over-all OC curve, and L*(p) be the resultant 
OC curve obtained by applying the multistation inspection scheme at 
each of the stations. Then it is desired that 


(1) L(p) = L*(p). 
It is clear that 
PSP 


with equality holding when all the p,’s are perfectly correlated. For the 
present, assume that the probability of obtaining defects at the sepa- 
rate stations is independent, i.e., the probability that a single randomly 
chosen item will not contain a defect in any characteristic is 


(1 — pi)(1 — po) +++ (1 — pe) 
so that the total percentagé defective is 


. k 
(2) p=1—-I]1(1— py) 


whereas the total percentage non-defective is 


gq =%92°°° Qk. 
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Under the independence assumption, it is correct, although not neces- 
sary, to use the same sample at each station.‘ The ensuing proofs make 
use of this fact. 

Since no assumption is made as to how the percentage defective is 
distributed, a plan [L*(p)] is sought that will approximate the original 
OC curve regardless of the distribution of the percentage defective. 
Moreover, if all the defects are concentrated at one station, the plan 
should result in the original desired OC curve [L(p)]. The sampling 
procedure developed has these properties. 

It is clear that if all the defects are concentrated at one station, the 
OC curve of the multistation plan reduces to the original given OC 
curve. In fact, for any division of the proportion defective, 


L*(p) S L(p) 


and L*(p) is a minimum when pi=p2= : ++ =r. 

Theorem 1. The OC curve of the multistation plan [L*(p)] never ex- 
ceeds the original given OC curve [L(p)]. 

Proof. Let d be the total number of the defective items in a sample of 
size n, and d; be the number of defects at the ith station in the sample of 
size n, then 


L(p) = 2rob {d Sa} 


and 
L*(p) = Prob {di +d2 +--+: +d Sa}. 
Now 
d+ de+---+dSa implies dSa 
since 
A+da+-:---+d2d. 
Hence 
(3) P{dit+d2+-++++d: Sa} S Prob {d <a} 
or 
L(p) 2 L*(p). Q.E.D. 
Theorem 2. 
L*(p) is a minimum when pi = p2 = +++ = De. 





‘It is assumed throughout the paper that the universe is large compared to the sample drawn. 
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Proof. Suppose 
k=2 and gq = q9. 
Define 
(4) L* (pi, pz, T, n) = Prob {dy +da3T —1| n} 


=OC curve of the two station plan, where 7 —1=acceptance number 
and n=sample size. 
Now 


J n . 
Prob {di = j} = (" pra 
and 
. n . . 
Prob {ds =j} = ¥ ) p2'g2"~?. 


Denote by f(@) the characteristic function of the sum of the two random 
variables, i.e., 


(5) f(9) = (pie® + 91)"(pre® + 92)". 
Applying the inversion formula to (5), 


1 2r 
Prob {di + d: = j| n} = — f e~*if(@)d0. 
2rJo 


It then follows that 


—1 2r 


1 T 
Prob {di +d: $ T —1| n} = — f e~**if(8)d0 
2m jm0~' 0 
(6) 
1 2n J — eT 
= — ——— f(6)d6. 
2rJo 1 e~? I ) 
Substituting (5) into (6), 
Prob {di +d: S$ T — 1| n} 
(7) 1 2r 1 — e— oT 


~ Oe 0 1 — e-* 





(pie + 91)"(poe + g2)"dd. 





Taking the partial derivative of (7) with respect to q:, 








52 


er 





MULTISTATION INSPECTION SCHEMES 473 
d Prob {ai +d $ T—1| n} 
0q1 
= np: Prob {d; + d2 = T —2| n—1} 
+ ng: Prob {di + d = T — 1| n— 1}. 
Taking the partial derivative of (7) with respect to qe, 
8 Prob {d, +d S T —1| 7} 
0Q2 
= np: Prob {di + d = T — 2| n— 1} 
+ nq Prob {di +d = 7T-—1|n—1}. 





(8) 





(9) 


Taking the total derivative of (7) with respect to g: subject to the con- 
dition g= 4192, 


{d, +d, <$ T—1/n} 
dq 


d Prob 





(10) n(q. — 2) 
= ae [Prob. {di +d: = T —2| jes 1}]. 


Now n/qi [Prob {di+d:=7'—2|n—1}] is a positive number and hence 
(5) changes sign at g1=q:2, but it was shown previously that 


L(p) 2 L*(p) 
where equality holds when 
ma =q or @ = q sothat gq. = q and q@=q 


are upper bounds. 
Therefore since L*(p) is continuous, 


a-ee”” 


must be the value of g: and gz which minimizes L*(p). 
For k stations 


q=%42°°*°: Qk. 
It is necessary to prove that L*(p) is minimized when 
Are eos Oe i.e., MQ=G@er--=gqQ= git, 


Suppose there are at least two q,’s, q, and q:, gq: such that L*(p) 
is minimized. Denote this OC curve by L*(p). 
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L*(p) = Prob (di: + do +--+ +d Sal qi: g) 
a k 
= >> Prob {Ld- dd 


M=O i=l 
= M| Qi ** Gen-rfeya °° Qed’ as} 
-Prob (d, + d: S$ a — M| qq). 
Let v=4q.9:. From the above results 
P(d, + di Sa —M| qq) = Prob (d, +d, a—M| Wo vr) 


and strict inequality holds if M ¥a. 
Hence L*(p) is greater than 


k 


> Prob {ra aa 
M=O 


i=l 
= M| Qi * °° Ge—1Ge+1 °° * GWe—-19e4+1 °° a 


-Prob (d. + de S$ a — M| Vv V2) 
since 
Prob (M # a) > O, 


which is a contradiction, and hence L*(p) must be minimized when 
Pi=po= +++ =Dp. 
Q.E.D. 

Throughout the above proofs, independence of defects was assumed. 
The effect of correlation would tend to make the OC curve for the 
multistation plan stricter than that obtained under the assumption of 
independence. For the extreme case of perfect correlation, the OC 
curve is as follows: 


Prob (di + +--+ +d: S a) = Prob (r > kn — a) 


where r is the number of non-defectives in a total sample of kn. The 
strictness of the plan is easily seen in comparing it to the OC curve of 
the original over-all plan, i.e., 


Prob (r > n — a). 
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SOME CONSIDERATIONS IN THE USE OF THE 
RESIDUAL METHOD OF ESTIMATING 
NET MIGRATION* 


Jacos S. S1eGEL 
Bureau of the Census 


AND 


C. Horace HAMILTON 
North Carolina State College 


Census net migration data and estimates of net migration 
obtained by the residual method, representing the difference 
between total population change and natural change during 
a period, are compared, and some general problems in the use 
of the residual method are discussed. Several residual meth- 
ods—the vital statistics method and the forward, reverse, and 
average survival rate methods—are described, compared, and 
evaluated. On the basis of a symbolic model representing the 
population in an age group in terms of migration cohorts, it is 
shown how the various survival rate formulas, unlike the vital 
statistics method, fail to make an accurate allowance for the 
net migration of persons who die during the migration period, 
except under very restricted conditions of migration. The 
maximum theoretical errors in the use of the various survival 
rate formulas, resulting from the inability of survival rates to 
measure deaths occurring in an area exactly, and the theoreti- 
cal errors under different conditions of timing of migration, 
are developed, and suggestions are given as to how the prob- 
lem may be handled. 


N RECENT discussions by members of the Committee on Migration 

Differentials of the Social Science Research Council, the results of 
which have not been published, some attention has been given to the 
theory and method of estimating the volume of net migration during a 
census decade. The present writers have undertaken to examine this 
problem further, partly at the suggestion of the Committee. The pur- 
pose of this article is to make available at once the preliminary results 
of this work to those research workers who are planning migration 
studies based on the 1950 Census data. Further development of the 
work would involve empirical testing of the methods discussed, includ- 





* An adaptation of a paper presented at the 1951 annual meeting of the American Statistical Asso- 
ciation in Boston on December 28, 1951, before a session sponsored jointly by the Association and the 
Rural Sociological Society. 

Published with the approval of the Director of the North Carolina Agricultural Experiment Station 
as paper No. 415 of the Journal Series. 
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ing the measurement of the differences between them under a variety 
of conditions, but it has not been possible to carry through the analysis 
to this point for the present article. 


CENSUS MIGRATION AND THE RESIDUAL METHOD 


This article is not primarily concerned with the use of the migration 
data which are to be provided by the 1950 Census, or with census 
migration data in general. These are not estimates (except insofar as 
they may be based on a sample) but tabulations of the replies of re- 
spondents as to their place of residence at some specified previous date. 
We are here primarily concerned with estimates of the volume of net 
migration into and out of specific areas or population groups developed 
by the so-called residual method, which, basically, involves removing 
natural increase or decrease (the balance of births and deaths) during 
a period from net total population change during the period. 


Census migration 


The need for developing and improving techniques of estimating the 
volume of migration is still great even though an increasing body of 
migration data is becoming available from censuses and sample sur- 
veys. The 1950 Census data on internal migration, for example, will 
provide an inadequate picture of recent migration trends. In addition 
to being based on a sample, these data will relate to one relatively short 
period (April 1, 1949, to April 1, 1950). Moreover, as is characteristic 
of census migration data in general, they will fail to include the move- 
ments of several groups—persons who were born or died or left the 
country during the migration period—or to indicate the return and 
multiple movements of the same individual. The census will, however, 
provide separate counts of in-migrants and out-migrants, and informa- 
tion on the origin and destination of migrants and on their personal, 
social, and familial characteristics, for a rather limited number of types 
of areas. Data on total in- and out-migration, for example, will probably 
be published or available for states, “state economic areas,” and the 
farm and nonfarm segments of each of these areas. Hence, data on net 
migration will also be available for these areas and combinations of 
them.! 





1 It is important in this general connection to distinguish between migration as a composition char- 
acteristic and migration as a factor in population change. The former point of view stresses the migra- 
tion status of the population at a particular date and the personal and social! characteristics of the in- 
dividuals classified as migrants. The latter point of view is primarily concerned with the volume of mi- 
gration into and out of specific areas or population groups as it affects their population growth and de- 
cline. The census material is oriented largely toward the former point of view. 
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The residual method 


The residual method cannot directly provide separate information on 
in-migration and out-migration or on the origin and destination of 
migrants. However, the method can measure the balance of the move- 
ments of all migrating groups, and an estimate can readily be developed 
by it (1) for any period and population group which has a constant 
characteristic (e.g., a sex group or the total population of a given geo- 
graphic or residence area) or a characteristic varying in a fixed way 
(e.g., an age group), and (2) for which two census counts, or population 
estimates, and an estimate of natural increase or decrease are available 
or can be developed. It is anticipated, therefore, that there will be con- 
siderable interest in measuring net migration by the residual method for 
many population groups for the 1940-1950 decade. The method is of 
special importance in reconstructing historical trends in net migration. 
It will probably remain of great importance at least until a continuous 
register system of population accounting is adopted in this country, as 
has been done in several European countries. 

There are two major variants of the residual method: (1) The vital 
statistics method, which employs birth and death statistics to allow 
for natural increase or decrease, and (2) the survival rate method, 
which, as it is generally applied, employs life table survival rates to 


allow for mortality and birth statistics to allow for natality. Several 
variations of the latter method may be distinguished: (a) The forward 
survival rate method, (b) the reverse survival rate method, and (c) the 
average survival rate method. These various methods will be described, 
compared, and evaluated later. 


Problems in the use of the residual method 


As a residual, an estimate of net migration obtained by any of the 
residual methods inherits the error involved in estimating natural in- 
crease or decrease and in measuring net total population change, al- 
though the errors may offset one another wholly or in part.? 

The estimation of the true number of births and deaths and the se- 
lection or computation of the proper life table survival rates, for the 
particular period and population group for which net migration is to 
be measured by the residual method, involve special problems which 
we propose to comment upon only briefly, mostly by illustration. 
Births and deaths have not been, and are not now, tabulated for some 





2 A general statement of some of the limitations of residual estimates has been given by Truesdell 
(1). 
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important types of population groups (e.g., the farm population), or in 
the detail necessary to make certain types of migration estimates (e.g., 
age for cities). For small areas it is especially important to use data on 
a residence basis or to adjust the data to a residence basis; but it is only 
in the last decade that most vital statistics were tabulated in this way. 
Estimates of the completeness of birth registration are now available 
on a less-than-state-wide basis for 1940 only, although such estimates 
are now being developed by the National Office of Vital Statistics for 
1950. Reliable estimates of the completeness of death registration are 
not available either for the country as a whole or for any geographic 
area within it. Life tables for cities and counties are not available and 
cannot be constructed directly because data on the age distribution of 
deaths for these types of areas are not available. Even if the necessary 
data were available, reliable life tables could hardly be prepared for 
the smaller areas because of the fluctuations involved in smal! numbers 
and the inadequate record of exposure to risk of death. As a result, sur- 
vival rates must be taken from life tables for broader areas or de- 
veloped from them, possibly by some adjustment or weighting scheme; 
or “census survival rates,” based on popuiation census figures by age 
and sex for the United States as a whole and adjusted to local mortality 
experience, may be used. The available official life tables for the United 
States, regions, states, urban-rural areas, and city-size groups may be 
used for these purposes. 

In addition, there are problems with respect to the estimates or 
counts of population. If the figures are based on samples, sampling 
variability must be taken into account. Information as to the complete- 
ness and accuracy of census or survey figures and of non-survey esti- 
mates based on census figures is extremely limited; the 1950 Census 
post-enumeration survey will provide some information of this kind 
but probably only down to the level of regions. Changes in the defini- 
tion of the population covered may have to be considered (e.g., change 
between the 1940 and 1950 Censuses in the definition of urban and 
rural and in the way the usual residence of college students was de- 
fined). In addition, in working with certain population groups, addi- 
tional factors may have to be taken into account, such as annexations 
in the case of some cities, changes in military strength and distribution 
in the case of the civilian population, and the shift of land from farm 
to nonfarm use in the case of the farm population. 

We have chosen to focus special attention in this paper on a particu- 
lar theoretical problem—the differences in the various estimates of net 
migration resulting from the use of the different procedures for estimat- 
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ing mortality associated with each of the four residual methods listed 
above. For simplicity, therefore, we shall assume in the following discus- 
sion (1) that the census or survey data or population estimates are exact, 
(2) that there are available, or it 1s possible to develop, exact estimates of 
the number of births and deaths, or life table survival rates which measure 
the force of mortality exactly, for any period or population group for which 
a residual estimate is being made, and (8) that changes in definitions and 
and special factors (other than the population totals and mortality and 
natality) are not involved. 


A SYMBOLIC REPRESENTATION OF MIGRATION COHORTS 


As a basis for the discussion which follows, a symbolic model is pre- 
sented in Table 1.* This table shows the components of the migration 
process for age cohorts alive at the start of a migration period, here- 
after referred to often as a decade because of the usual application to 
10-year intercensal periods. For cohorts born during the decade, a 
parallel symbolic model could be set up. The two census figures are 
symbolized as follows: 


P,=the enumerated initial population in an age group at the be- 
ginning of the decade. 

P,=the enumerated final population at an age ten years greater than 
for the initial population at the end of the decade. 


It is convenient to think of the migration process in terms of what 
may be called “age cohorts by migration status.” For each age cohort, 
therefore, several migration status subcohorts are distinguished in the 
table—a non-migrating subcohort, identified by subscript n; an in- 
migrating subcohort, identified by subscript 7; and an out-migrating 
subcohort, identified by subscript 0. C is used to indicate the size of 
each age-migration cohort at the beginning of the decade, L the size 
of each cohort at the end of the decade, and D (with the appropriate 
subscript) the number of deaths occurring to each cohort during the 
decade. The C’s are thus distinguished as follows: 


C,=that part of Po which does not migrate; 

C;=an in-migrating cohort living outside the area at the beginning 
of the decade; and 

Cy)=an out-migrating cohort making up a part of P» at the beginning 
of the decade. 





2 A somewhat different symbolic representation and analysis of the components of the migration 
process from those presented here have been given by Hutchinson [2]. 
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L,, Li, and Lo, respectively, represent the survivors of C,, Ci, and Co at 
the end of the decade, and D,, D,, and Do symbolize the total number 
of deaths among each migration cohort during the decade. 

We can next consider persons alive at the end of the period who 
migrated during the period (LZ, and Lo) as survivors of a cohort of 
persons alive at the beginning of the period (C; and C3), part of whom 
died before they had a chance to migrate, part of whom died after 
migrating, and the remainder of whom survived. Deaths among the 
in-migrating cohort (D,) may thus be subdivided into 


“ =deaths of persons among the in-migrating cohort after they have 
migrated 


and 


D,=deaths of persons among the in-migrating cohort who have not 
migrated, 


and deaths among the out-migrating cohort (Do) may be subdivided 
into 


D,*=deaths of persons among the out-migrating cohort after they 
have migrated 


and 


D,’=deaths of persons among the out-migrating cohort who have 
not migrated. 


The reason for including D,> and D,? is to account completely (balance 
the population change bocks!) for the population of the assumed co- 
horts at the beginning and at the end of the period, alive and dead, 
migrating and not migrating, living inside and living outside of a given 
area. 


DEFINITION OF NET MIGRATION 


We can now use the symbols in our model to distinguish several 
definitions of net migration. We shall call them exact net migration, 
net population change due to migration, and census net migration. 


Definition of “exact net migration” 


“Exact net migration” is offered as a standard for the comparison, 
given later, of the estimates of net migration provided by the various 
residual methods. It is defined as the difference between the total number 











































RESIDUAL METHOD OF ESTIMATING NET MIGRATION 
TABLE 1 


SYMBOLIC REPRESENTATION OF AN AGE COHORT BY 
MIGRATION STATUS AND OF THE COMPONENTS 
OF NET MIGRATION 


(This model depicts only age cohorts alive at the start of the migration period, 
but would be essentially the same for the newborn cohorts. Dashes indicate that 
the category is not considered relevant or that the number is zero. Blanks indi- 
cate that additional symbols are not considered desirable) 














Non- In- Out- 
migrant migrant migrant 
Item Total antes at ouilie 
cohort cohort cohort 
SELECTED COMPONENTS OF 
PoruLATION CHANGE 
Enumerated initial population 
Within area............... Po Co — Co 
OUASIES GPOR..... 55. sc eccccc C; -- Ci -—- 
PIR io shige cd waeeen tases D, D 
WIGMI GHOB. oo ccc ssisasaee D Da De De 
a rr De Do 
Enumerated final population 
re P, din Li _— 
re Lo ~- — Lo 
CoMPoONENTS OF Net MIGRATION 
Exact net migration.......... M= M —- Mo 
Living at end of decade..... L — Le 
Died during decade......... De — D¢ ' 
Estimated net migration 
Forward method........... M2* = FF — Le 
Living at end of decade... Lk —_ Le 
Died during decade....... —_— —_ 
Reverse method............ M;t= CG - CG 
Living at end of decade... Kk —- Lo 
Died during decade...... D -— De 
Average method........... Mt= an = aot 
Living at end of decade... Lt — Le 
Died during decade...... ° ° oa as 





* See proof on page 491. 
t See proof on page 489. 
t Based on formulas for Ms and M:. 
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of persons ever entering the area during the decade and the total number of 
persons ever leaving the area during the decade. Net migration, so defined, 
represents the balance of all movements, including those of persons 
who were born (for the total population or an age range including a 
newborn age cohort), died, entered the country, or left the country 
during the decade, and including, in effect, the return and multiple 
movements of the same individual. Using the model in Table 1, we 
may represent the total number of persons migrating into an area dur- 
ing a decade (,), for a given age cohort, by 


M;=1,+ D¢, 


where the components represent, respectively, in-migrants who live 
until the end of the migration period and in-migrants who die after 
entering the area. Similarly, we may represent the total number of 
persons migrating out of an area (Mo) by 


Mo = Lo + Dv’, 


where the components represent, respectively, surviving out-migrants 
and out-migrants who die after leaving the area. As shown in Table 1, 
we may also represent the components of net migration, for a given age 
cohort, as follows: 


L;—L»o=net migration among those living to the end of the decade; 
D#—D .*=net migration among those who die during the decade. 


Definition of “net population change due to migration” 


Exact net migration differs from “net population change attributable 
to migration,” another important concept of net migration, which, as 
the phrase suggests, refers to the total effective contribution of migra- 
tion to population growth or decline during a given period. For cohorts 
alive at the start of the period, this type of figure is comprised entirely of 
“net migrants” surviving to the end of the period and is symbolized by 
L,;,— Lo. For the total population or an age range including the newborn 
age cohorts, the positive or negative contribution of migration to the 
number of births in the area surviving to the end of the period, as well 
as the newborn “net migrants” surviving to the end of the period, is 
also included. The gross addition to the newborn cohorts through mi- 
gration, excluding the allowance for death during the period, may be 
symbolized by BP+B*— Boe— Bo. 


Definition of “census net migration” 


As has been suggested, the definition of net migration used as a 
standard in this paper also differs from “census net migration,” which 
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is comprised entirely of “net migrants” alive and in the country at both 
the beginning and end of the migration period. All persons born during 
the period are conventionally considered nonmigrants, immigrants are 
conventionally not included with the migrants, and emigrants, of 
course, ere not included in the census. For cohorts alive at the start of 
the decade, the second and third definitions of net migration given here 
are basically the same. In a sense, they are both roughly analogous to 
describing the number of children under 10 years of age in a census as 
an estimate of the number of births during the preceding 10-year 
period, in that they count only survivors, not the events themselves. 
The “census net migration” definition may be symbolized by L;—Lo, 
with the understanding that L; and Lo refer only to persons alive and 
in the country during the entire decade. 


THE VITAL STATISTICS METHOD 


Net migration as we have first defined it above could be estimated 
exactly by the vital statistics method if complete and accurate vital 
statistics and population counts were available. The formula, for the 
total population of an area, is: 


(1) 2M = (Z2P; — =Po) — (ZB — =D), 


in which 2B and ZD, respectively, symbolize the total number of births 
and deaths occurring to residents of the area, 2M the baiance of all 
movements into and out of the area, and ZPo and ZP,, respectively, 
the total population at the beginning and end of the period. Thus, in 
keeping with our first stated definition, net migration equals net total 
population change minus natural increase or decrease measured by use 
of birth and death statistics for the area, including, it is to be noted, 
the births and deaths which occur to migrants residing in the area. 
Birth and death statistics classified on a residence basis and adjusted 
for underregistration should be used in applying the formula, of course. 
The formula is a solution for the net migration component of the almost 
axiomatic equation, widely used in estimating population, 2P)+2ZB 
—2D+2=M =P. 

The statement of the vital statistics form::’.. just given applies to the 
total population of all ages; hence, the need for including births and 
the population age groups representing their survivors at the end of the 
decade as elements in the equation. This formula is particularly useful 
for making estimates of total net migration to counties, cities, states, 
and other areas for which the required vital statistics data are available. 
For illustrations of its use to estimate total net migration for the 1940- 
50 decade, see references [3] and [4]. 
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For age cohorts born during the decade, formula (1) becomes 


(2) M=P,—B+D, 


in which the symbols refer only to a particular population cohort born 
during the decade. For age cohorts born before the date of the earlier 
census (10 years of age or over at the end of the decade,) no entry for 
births is involved in formula (1) and it becomes 


(3) M=P,—P)+D. 


It is with formula (3) that we shall be primarily concerned in the 
analysis which follows. 

Table 1 may be used to demonstrate how formula (3) makes proper 
allowance for thuse migrants, both in and out, who die after migrating. 
From Table 1 it is evident that 


(4) M = M;— Mo = L; + Ds — Lo — Do’; 


ie., total net migration equals surviving in-migrants and in-migrants 
who die minus surviving out-migrants and out-migrants who die. It is 
also evident from Table 1 that 


P,=L,+ LD; 

Po = Ca + Co = La + Da + Lo + Do* + Do? 
and 

D = D, + D# + Do. 


Substituting these values for P;, Po, and D in formula (3) shows that 
Pi —- Po+D=1;+ De — Lo — Do’, 


or that formula (3) is equivalent to formula (4). 

To use formula (3), death statistics classified by age are required, 
but they are not usually tabulated in this detail below the state level. 
The formula also involves, first, the use of an interpolation technique 
to subdivide deaths for each calendar year, tabulated in the conven- 
tional 5-year age groups according to age at death, into single years of 
age; second, the use of so-called “separation” factors to “separate” the 
single-year-of-age deaths into those which occur to a particular aging 
cohort and those which do not [5, p. 117 ff.]; and third, the summation 
of the appropriate parts to obtain the total number of deaths which 
occur during the decade to each age cohort. The deaths obtained by 
this subdivision, separation, and summation process are represented 
by D in Table 1 and formula (3). 
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For example, an estimate of the number of deaths during the decade 
for the cohort 10 to 14 years of age on April 1, 1940 (20 to 24 years of 
age on April 1, 1950) may be obtained by summing the deaths as shown 
in the following table: 


lend 
a ad Deaths in cohort 


15 3 3 3 3 9 
1940 ee 


1941 = Dio +S 5 Du + Diz oe Dis + Du +5 = Dus + 35 =>" 


1949 30 Dig + 39 Dip + Dao + Da + Da +5 = Das + 35 Lan 


1 1 
1950 33 Diy + 7 Dao ie a > 5. + — + Ds + 


In the table aD, equals the number of an of persons il x at the 
time of death during the calendar year indicated, in the cohort 10 to 14 
years of age on April 1, 1940, on the assumption that the deaths are 
distributed rectangularly (evenly) throughout the year of age and the 
calendar year. The computation of the number of deaths in the cohort 
during the census decade may be illustrated geometrically by the dia- 
gram (p. 486), in which each of the squares represents deaths at a 
particular age (within the range 10 to 24 years) in a particular year 
(1940 to 1950) and the heavy black line outlines the area (parallelo- 
gram) representing the deaths which occur to the cohort as it ages 
through the census decade, April 1, 1940, to April 1, 1950. Thus, for 
example, all deaths aged 17 in 1944, represented by the square out- 
lined by the dotted line, are included in the deaths occurring to our 
cohort, but only a portion (9/32) of the deaths aged 18 in 1943 (the 
latter being represented by the square enclosed by the broken line) are 
included in the deaths to our cohort. A proportion of D, shown in the 
table corresponds to the ratio of the portion of the “cohort area” within 
any square to the total area of that square. 


THE SURVIVAL RATE METHOD 


Even where the requisite death statistics are available, the cumber- 
someness of the process described in the preceding section is a serious 
handicap to the use of the vital statistics method for estimating net 
migration for age groups. If the requisite death statistics are not avail- 
able, or if a high degree of accuracy is not required and the investigator 
wishes to avoid the painstaking work involved in determining deaths 
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for moving age intervals, one or the other of the survival rate methods 
may be used. 

As mentioned earlier, the survival rate method uses life table sur- 
vival rates rather than death statistics to allow for mortality. A sur- 
vival rate represents the probability that a person in a given age cohort 
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will survive a given period of time; it is simply the complement of a 
mortality rate. For example, if the mortality rate is .12, representing 
120 deaths per 1,000 of the base population, the survival rate is .88, 
representing 880 survivors per 1,000 of the base population. Methods 
of computing survival rates from life tables and from United States 
census data have been treated elsewhere [5, pp. 22-23; 6; 7, pp. 5-7]. 
It is important to note in this connection that the reliability of the 
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survival rate method depends to a great extent on the use of survival 
rates which reflect accurately the mortality of the population group 
under study and which allow for errors in the population data due to 
underenumeration and misstatement of ages [6]. Further research is 
needed on the variation in the estimates of net migration resulting 
from variation in the survival rates selected. 

Even though a set of reasonably accurate survival rates may be 
derived, however, there is another source of error which has received 
little attention but of which the research worker should be aware. This 
error results from the inability of survival rates to measure exactly the 
number of deaths occurring in an area because of the way the rates are 
applied, or, from another point of view, because of the very occurrence 
of migration. Jaffe [7, p. 184] suggests the existence of this general type 
of problem and illustrates it. We cannot, however, fully accept his 
analysis and conclusions, based as they are on the premise that net 
migration relates only to survivors. We shall devote considerable at- 
tention in the remainder of this article to a description of this source of 
error and to suggestions as to how it may be handled. 

Commonly, in the application of the survival rate method by re- 
searchers, the measurement of the migration of the newborn cohorts 
has been neglected, so that an allowance for natality has not been in- 
volved in their work. Paralleling the measurement of mortality, either 
birth statistics (that is, the estimated total number of births occurring 
to residents of an area) or “cohort birth rates” (birth probabilities) 
may be used. A cohort birth rate, or birth probability, may be defined 
as the number of births which a thousand women living at the beginning 
or end of a period, respectively, will have (forward rate) or did have 
(reverse rate) during the period. Because cohort birth rates have rarely, 
if ever, been used in the measurement of net migration and because an 
analysis involving both cohort birth rates and survival rates would be 
quite complex, the present analysis will be restricted to the use of birth 
statistics for the natality allowance where newborn age groups are in- 
cluded. 


The forward survival rate formula 


The first survival rate formula to be considered, the forward, or con- 
ventional, survival rate formula, has been used quite widely to estimate 
net migration for population groups for which the required classifica- 
tions of death statistics were not available (e.g., the rural-farm and the 
rural-nonfarm segments of the population, areas not earlier included 
in the death registration area, population in the years prior to the 
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establishment of the death registration area, etc.) and to estimate net 

migration for age groups even where the required death statistics were 

available. Rural sociologists have made considerable use of this method 

in measuring urban-rural migration and farm-nonfarm migration. 
The forward survival rate formula is 


(5) M: = P; — rPo, 


in which r represents the survival rate. According to this method, the 
population at a given age at the beginning of a decade is multiplied by 
an appropriate survival rate to obtain an estimate of the population 
10 years older that would be present at the end of the decade if no 
migration occurs. This number of expected survivors is then compared 
with the actual population in the age group at the end of the decade to 
determine the amount of net migration. If the “expected” population 
is smaller than the census count, a net in-migration is indicated; if the 
“expected” population is larger, a net out-migration is indicated. An 
estimate of the net migration of the newborn cohorts may be obtained, 
of course, by “surviving” the births during the decade to the end of the 
decade. General descriptions of the forward survivai rate method have 
already been given elsewhere [6; 7, pp. 185-187]. For illustrations of the 
application of the method, see references [8] and [9]. 

Formula (5) is similar to formula (3) in that (l1—r)Po has been im- 
plicitly substituted for D,‘ i.e., 


(6) M, = P,; — Po + (1 — 1r)Po. 


However, this substitution is not a valid one because, as can be seen 
from Table 1, the number of deaths to the cohort Po, represented in 
effect by (1—r)Po in formula (5), is equal to D,+ Do (or Dx +Ds*+ Do’), 
and not D (or D,+D,*+D,°), as the substitution assumes. The differ- 
ence between these values is Do*— D,*, i.e., the difference between the 
deaths of in-migrants and out-migrants. This result reveals an im- 
portant feature of the forward survival rate method; namely, that it 
defines net migration wholly in terms of the survivors of the migrating 
cohorts at the end of the decade. This may be demonstrated algebrai- 
cally by substitution of the following equivalents in formula (6): 


P,=1,+ L; 
Po = Can + Co = La + Dn + Lo + Do® + Do? 
(1 — r)Po = D, + Do* + Do®. 





4 Proof: If Pi— Ps +D =P; —rPy; then D=P.—rP. =(1 —r) Pe. 
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This gives 
(7) MM, = Li - Ls, 


the net migration among the surviving population. 

Thus, even though a reliable survival rate may be available, formula 
(5) does not yield an estimate of all net migration. It fails to include an 
allowance for the net number of persons who migrated in or out and 
then died. An estimate of this group can be made on the basis of certain 
reasonable assumptions, however. It may be assumed, for example, (1) 
that the various migration cohorts have the same mortality rates at a 
given age and (2) that half of the net number of deaths implied as 
occurring to the migrating cohorts occurs after migration. The follow- 
ing steps would be involved in applying these assumptions to obtain 
a revised estimate of net migration: (1) Estimating the implied original 
net size of the migrating cohorts: (L;—Lo)/r=Ci—Co; (2) computing 
the net deaths implied for these cohorts: (C;— Co) — (Li— Lo) = Di— Do; 
(3) estimating the net deaths presumed to occur after migration: 
(Di—Do)/2; (4) combining the two “net migrant” groups: L;—Lo 
+(D;—Do)/2. Obviously, the conditions assumed are purely fictional; 
although the first can hardly have an important effect on any computa- 
tions and represents the only practical assumption that can be made, 
the second might lead to substantial error, as will be shown later. 

Similarly, the use of survival rates, rather than death statistics, to 
allow for mortality in connection with the estimation of the population 
of an area for intercensal or postcensal years, theoretically involves 
some error. If, in the same connection, the net migration component is 
being estimated, a theoretically appropriate estimate will not be ob- 
tained if survival rates are first used to measure net migration in some 
segment of the population, such as the school-age population, from 
which the estimate of total net migration is derived. 

Before proceeding to the other survival rate formulas, we should like 
to note that formula (5) does provide a theoretically accurate estimate 
of the “net change in population attributable to migration” and of net 
surviving migrants, for age cohorts alive at the start of the decade, and 
as such is a valid measure. For some purposes, this is the type of figure 
desired, not an estimate of the volume of net migration. 


The reverse survival rate formula 


The second survival rate formula to be considered, the reverse sur- 
vival rate formula, has been used only rarely, if at all, and, to our 
knowledge, no specific reference to it appears in the literature; yet the 
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reverse method seems as logical as the forward method for measuring 
intercensal migration. (The reverse approach to the estimation of 
mortality has not been very “popular” probably because it is not gen- 
erally applicable to postcensal estimation or forecasting.) 

The reverse survival rate formula is 

P. 1 P = rP. 0 
(8) M; = — — Py) = —— 

r r 
According to this formula the population at a given age (10 or over) at 
the end of a decade is divided by an appropriate survival rate to obtain 
an estimate of the population 10 years younger at the start of the 
decade on the assumption that no migration had occurred during the 
decade. This “expected” population is then compared with the corre- 
sponding actual population to determine the amount of net migration. 
If the expected population is larger, a net in-migration is indicated; if 
the expected population is smaller, a net out-migration is indicated. 
(An estimate of the net migration of the newborn cohorts may be ob- 
tained, of course, by “younging” the enumerated population under 10 
years of age at the end of the decade to birth and then subtracting the 
actual number of births during the decade.) 

The same survival rate, r, is here used in both the forward and re- 
verse formulas. This assumes that the survival rates of the various 
migration cohorts are the same. This assumption helps to simplify the 
algebraic manipulations without affecting the results to any important 
extent (since we should not expect the survival rates of the various 
migration cohorts to differ much from one another) ; it also makes possi- 
ble expressing each survival rate formula very simply in terms of the 
other—a matter to be discussed later. The assumption made is not in- 
volved where the formulas are expressed in more general terms involv- 
ing C, L, or D. 

The reverse formula is also similar to formula (3), the formula for the 
exact method, except that (l—r)P:/r has been implicitly substituted 
for D; i.e., 


(1 —r)Py 


r 


(9) Mz =P, — Pot 


Here again an erroneous substitution has been made because, as refer- 
ence to Table 1 shows, the number of deaths among the cohort whose 
survivors comprise P1, represented in effect by (P:1/r) —P: in formula 





§ Proof: If Pi— P's +D =(Pi—rPs)/r, then D =(Pi/r) —Pi1 =(1 —r)Pi/r. 
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(8), is equal to D,+D; (or D,+D#+D?), and not D,+D*°+D0’, as 
the substitution implies. The difference between these two values is 
D?—Dz, i.e., the magnitude of the difference between the implied 
numbers of deaths among the in- and out-migrating cohorts before 
they have a chance to migrate. Substituting L;— Lo for its equivalent 
P,;—rPo (both equivalent to M2, as shown by formulas (5) and (7)) in 
formula (8) and using the equivalent values of Table 1, we can write 
the reverse formula as follows: 


Pi-7rP) Li-L 


r ? 





(10) My = -=6-~Q@eh¢+h-h- 
Thus, the reverse survival rate formula defines net migration wholly in 
terms of the original migrating cohorts at the beginning of the decade. 


The average survival rate formula 


Since, as has been indicated, the forward survival rate formula im- 
plies that no persons who die during the period (including those who 
move and then die) have migrated, and since the reverse formula im- 
plies that all those in the migrating cohorts who die during the period 
(including those who die without actually moving) have migrated, it 
seems logical that an average of the two formulas would be better than 
either one. This average formula is 


az 


(11) M, = 


(P; = rTP»). 

By combining formulas (7) and (10) we can also write the average 

formula as follows: 
C;-Q4+iL-—L 


D;-— D 
(12) My : = am tg — Bene main 





The average formula implies a substitution for the D-factor in for- 
mula (3) of [(1—r)/2r](Pi+rPo),’? which equals Dn+ (D;+ Do)/2.*° This 
substitution for D is still theoretically not identical with the actual 
number of deaths occurring in the area and involves an error of 
(Dot — D#+D,—Do*)/2, or an average of the errors of the forward and 
reverse methods. However, formula (12) would be equivalent to for- 





6 Derivation: M.=(M:+M:;)/2 =[Pi—rPet+(Pi—rPs)/r]/2 =[(1+1r) (Pi—rPs)]/2r. 

7 Proof: If Pi—Pe+D=[(1+r)/2r] (Pi-rPs); then D=(Pi+rPe—rPi—rP»)/2r =[(1—r) 
(Pi+rP)]/2r. 

8 This expression can be developed simply by averaging the values of D implied by the forward and 
reverse survival rate formulas. 
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mula (3) or (4) if DP’— Dob = D* — Do*.*® Thus, the average formula gives 
the same result as the vital statistics formula if exactly one-half of the 
implied net number of deaths among the migrating cohorts occurs after 
migration. The average formula thus implies an even, or approximately 
even, flow of migrants during the migration period. 


Relation between the survival rate formulas 


If it is assumed that the various migration cohorts have the same 
survival rate, each of the survival rate formulas can be expressed as 
the product of a function of r and each of the other survival rate 
formulas. Hence, once net migration has been calculated for an age 
group by one method, the other two estimates can be computed very 
easily by using the appropriate conversion factors. They are as follows: 
Forward ( 2 


—) Average 
l+r 


1 1 
Average = ( 7 ")orward = ( - ") Reverse 
r 


Reverse = 
T 


2r 
l+r 


Forward = (r) Reverse = ( 


) Average. 


The reverse method consistently gives larger estimates of net in- or 
out-migration at any age than the forward method; the average method 
gives intermediate results. The actual difference between any two esti- 
mates at any age depends both upon the magnitude of the initial 
estimate of net migration and the value assumed for r. Because the 
migration rate and the mortality rate are not positively correlated by 
age, the amount of difference between any two estimates will tend to 
vary only roughly, perhaps even irregularly, with respect to age. 

The per cent differences among the various survival rate estimates at 
any age depend entirely upon the value assumed for r. The difference 
between the forward and reverse estimates will equal or closely ap- 
proximate (for most observed values of r) the mortality rate, and be- 
tween either of these and the average estimate, one-half of the mor- 
tality rate. (The exact relationships may be developed from the con- 
version formulas given above.) Hence, the more remote the period, the 
higher the age (above the very young ages), and the poorer the mor- 
tality record of the area or population group, the greater the difference 





® Proof: If Li—Le+(Dg—D»)/2 =Lkj—-Le+Di°—D"*; then Dgj—De=2(Dy*—D.*) and hence 
Dé —DP=Di* -D.*. 





RESIDUAL METHOD OF ESTIMATING NET MIGRATION 493 


to be expected between the various survival rate estimates. The differ- 
ences will tend to increase rather regularly with advancing age (above 
the very young ages), becoming quite considerable in the very old ages. 
The differences between the forward and reverse estimates of 10-year 
intercensal net migration that may appear for various population 
groups for the decades 1900-1910 to 1940-50 may be suggested and 
illustrated by the following figures developed on the basis of various 
official U. S. life tables: For white females the difference will approxi- 
mate 10 per cent at some age (depending on the period) in the range 
50 to 64 years of age (terminal age) and 20 per cent at some age in the 
vange 60 to 74 years; for nonwhite males the difference will approxi- 
mate 10 per cent at some point in the range 25 to 44 years and 20 per 
cent in the range 50 to 59 years. Computations by Daniel O. Price and 
by the authors indicate differences between the forward and reverse 
survival estimates of intercensal net migration generally of 7 to 20 
per cent, for the total population 10 years old and over, by color and 
sex, in several selected states for the 1900-1910 to 1940-1950 decades. 


DETAILED ANALYSIS OF ERRORS IN SURVIVAL RATE FORMULAS 
Theoretical range of error in survival rate formulas 


The conditions stated or implied earlier for any survival rate formula 
to equal the vital statistics formula would not ordinarily be fulfilled 
exactly—not even that for the average formula—nor could it generally 
be known if they were fulfilled exactly. It might be worthwhile, there- 
fore, to measure the maximum amounts of error involved in the use of 
each of the three survival rate formulas. For this purpose we shall 
assume certain extremely abnormal situations of timing of in- and out- 
migration (and, hence, of timing of deaths among the migrating co- 
horts) and shall use formula (4) to represent exact net migration re- 
gardless of when in- and out-migration occurs: 


(4) M=L;-— 10+ Di — Do. 


Now let us see what happens to the components of formula (4' when 
the following special situations prevail: 

Situation 1: All in-migration occurs instantaneously at the beginning 
of the decade and all out-migration occurs instantaneously at the end 
of the decade. In this situation, formula (4) becomes 


(13) M=1;-h+D; 


because there are no deaths during the decade among the out-migrants 
after they leave the area (Do*=0) and the in-migrating cohort is ex- 
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posed to the risk of death for a full decade after it enters the area 
(D#=D)). 

Situation 2: All in-migration occurs instantaneously at the end of 
the decade and all out-migration occurs instantaneously at the begin- 
ning of the decade. In this situation 


(14) M=L;—In-—Do 


because there are no deaths among the in-migrating cohort after it 
enters the area (D,*=0) and the out-migrating cohort is exposed to the 
risk of death for a full decade after it leaves the area (Do*= Do). 

The maximum possible error in using the forward survival rate for- 
mula is —D; or +Do, corresponding to the actual errors when this 
formula is used in situations 1 and 2; the maximum possible error in 
using the reverse survival rate formula is — D> or + Dj, corresponding 
to the actual errors when this formula is used in situations 1 and 2. 
The difference between the estimates of net migration in the two ex- 
treme situations is the sum of the implied numbers of deaths during 
the decade among the two migrating cohorts. That is, 


(L; — Lo + Di) — (Li — Lo — Do) = Di + Do. 


Since the average survival rate formula is equal to an average of for- 
mulas (13) and (14), the maximum possible error in using the average 
formula is one-half of the difference between the estimates under these 
two extreme assumptions, or (D;+ Do) /2. 

It may be noted that under the conditions postulated above and 
particularly in the ages subject to high mortality, the errors in using 
the survival rate formulas, whether forward, reverse, or average, could 
in some cases considerably exceed the difference between the forward 
and reverse estimates (i.e., the mortality rate) and be so great as to 
invalidate the results completely. An illustration is provided below. 


Numerical examples of errors under conditions of abnormal timing of 
migration 

Table 2 illustrates, with the use of hypothetical data, the error, under 
the conditions of extremely abnormal timing of in- and out-migration 
postulated above, in the various survival rate estimates of net migra- 
tion. Sections a and b of the table, respectively, correspond to the tim- 
ing assumptions in situations 1 and 2 above. The enumerated initial 
and final populations, and the initial and final migrating cohorts, are of 
the same size in section a as in section b. A perfectly reliable survival 
rate of .80, equally applicable to all cohorts, is assumed throughout the 
table. The latter assumptions are made so that the differences in the 
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resulting estimates of net migration, under a particular set of assump- 
tions regarding the timing of migration, will be due wholly to the 
differences in method and that differences between the results under 
timing assumptions a and 6 will reflect the effect only of differences in 
the timing assumptions. 

It may be noted, first, that the estimates of net migration shown by 
the survival rate methods are the same in a as in b. This is so because a 
particular survival rate formula assumes a particular timing of in- and 
out-migration and cannot, therefore, allow for variations in the timing 
of migration, as does the vital statistics formula. Nezt, it may be seen 
that the three survival rate estimates under each timing assumption 
are rather different, both absolutely and relatively, from the exact net 
migration and from one another. For three out of the six survival rate 
estimates shown, the error exceeds 20 per cent, or the: difference be- 
tween the forward and reverse estimates. Finally, it is of interest that 
the best survival rate results were obtained, in this particular instance, 
by the reverse method under the conditions of a and by the forward 
method under the conditions of b. 


Comparison of errors in the survival rate formulas under five “timing” 
situations 


Five formulas, representing “vital statistics” estimates of net migra- 
tion under five different combinations of assumptions with regard to 
the timing of in- and out-migration, are given below in terms of whether 
the “migrants” were living or dead at the end of the migration period. 
The errors in the use of each survival rate formula to measure net 
migration under the conditions indicated are shown at the right of each 
equation, and notes commenting on the relative merits of the various 
survival rate formulas under these conditions are given below each 
equation. 








Error 
Forward Reverse Average 
formula formula formula 
a. All in-migration at the beginning, 
all out-migration at the end of the Di+Do 
decade. M =L;—Lo+D;=C;—Lo —D; —Do = 


2 
Note: All three formulas under- 
estimate net in-migration and over- 
estimate net out-migration. The re- 
verse formula is preferable if there 
is a net in-migration, and the for- 
ward formula is preferable if there is 
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is a net out-migration. The average 
formula is suggested in the absence 
of knowledge as to the relative im- 
portance of in- and out-migration. 


. All in-migration and all out-migra- 


tion at the beginning of the decade. 
M =L4—Lot+ Di —Do=Ci—Co 

Note: The reverse formula is ex- 
act, and the forward formula is least 
acceptable. 


. All in-migration and all out-migra- 


tion at the middle of the decade orin 
an even flow (i.e., so patterned that 
half of the implied net deaths among 
the migrating cohorts occurs after 
migration). 
Di; —Do 
M =L;—Lo+———_- 
2 
Cis—Cot+Li—Lo 





2 
Note: The average formula is 
exact. The forward formula under- 
estimates net in-migration and net 
out-migration. The reverse formula 
errs in the opposite direction. 


. All in-migration and all out-migra- 


tion at the end of the decade. 
M=L;—Ly 

Note: The forward formula is 
exact and the reverse formula is 
least acceptable. 


. All in-migration at the end, all out- 


migration at the beginning of the 
decade. 
M =Li;—Ino—Do=Li—Co 

Note: All three formulas over- 
estimate net in-migration and un- 
derestimate net out-migration. The 
forward formula is preferable if 
there is a net in-migration, and the 
reverse formula is preferable if there 
is a net out-migration. The average 
formula is suggested in the absence 
of any such knowledge. 





Error 
Forward Reverse Average 
formula formula formula 
D:—Do 
— (D;—Do) Noerror — ——— 
2 
Di—Do Di —Do 
- + No error 
2 2 
D;—Do 
Noerror +(Dij—Do.) + ——— 
2 
Di+Do 
+Do hah +-— 
2 
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As indicated by the “Errors” and the “Notes” above, the three sur- 
vival rate formulas vary in their success in estimating net migration 
under various conditions. However, because the pattern of in- and out- 
migration frequently approximates an even flow (even though the in- or 
out-movement may predominate), the average survival rate formula 
may, on the average, be expected to produce more accurate results 
than the other survival rate formulas. It is recommended for use when 
the most accurate results are sought and specific knowledge regarding 
the pattern of migration is not available. When some such knowledge 
is available, it may be used to select that survival rate formula which 
will give the most accurate results, as indicated above. It should be 
noted that when net migration at a particular age approximates zero, 
in three of the five timing situations described above, including the 
“average” situation, the three formulas are of virtually equal reliability 
and have rather small errors. Since with all formulas the error involves 
only the deaths among the migrating cohorts during the census decade 
and because of the increase in mortality with increasing age (above the 
very young ages), the relative size of the error may generally be ex- 
pected to increase with increasing age for any particular population 


group. 
Occurrence of extreme situations 


With respect to the problem of unusual situations giving rise to error 
in estimating net migration during a decade by the survival rate 
method, one may well ask: Do such extreme situations ever occur? 
Probably not, but illustrations of uneven and unbalanced in- and out- 
migration are not unusual during decades of war, depression, or other 
abnormal conditions. The ebb and flow of rural-urban migration is 
known to be highly correlated with variations in economic conditions. 
In the early 1930’s, for example, migration from cities to rural areas 
increased and migration from rural areas to cities decreased; whereas, 
in the later 1930’s migration turned again heavily from rural areas to 
cities. No doubt, in many areas conditions of highly unbalanced and 
uneven flow of migration occurred (ghost towns, drought-stricken areas, 
etc.). Also, during the early 1940’s booming war-time industrial centers 
attracted thousands of people, many of whom migrated away from 
such places in the late 1940’s. 


CONCLUDING REMARKS 


The total error involved in using the survival rate method is due to 
three sources: (1) inaccuracies in the census counts or the population 
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estimates, types of data on which all methods described in this article 
depend; (2) unreliable survival rates; and (3) the inability of survival 
rates to measure deaths occurring in an area exactly. Although this 
article has given considerable attention to the third source of error, 
these errors are probably no greater than those introduced by using 
inaccurate census counts or survival rates which do not reflect local 
mortality experience accurately. Unfortunately, it will be almost im- 
possible to obtain either perfectly reliable census data or survival rates. 
However, these errors can be reduced to a minimum (1) by correcting 
census data for estimable errors (e.g., underenumeration of young 
children) and incomparabilities (e.g., residence of college students), 
(2) by using so-called “census survival rates” adjusted to local mor- 
tality experience, and (3) by testing survival rates carefully in areas 
and situations which are known to be more or less stable with regard to 
migration and mortality. 

What is needed now are some carefully designed studies which will 
provide estimates of the amount of error involved in using the survival 
rate method in its several forms. The results obtained by the vital 
statistics method may then be compared with those obtained by the 
survival rate method. This can be done only in the case of areas for 
which the needed vital statistics are available. If such a comparison 
shows relatively little error, then it should be somewhat safer to use the 
survival rate method on those parts of the areas involved for which 
segregated vital statistics are not available. 

In addition to testing the survival rate formulas by the method de- 
scribed, the research worker should use his judgment in selecting the 
appropriate survival rate formula. If the general migration pattern of 
an area is known or if it can be evaluated beforehand, a survival rate 
formula can be selected so as to reduce the error of estimate. Under 
some conditions the forward survival rate formula might be the most 
suitable; under other conditions the reverse formula would be the best; 
and in still others the average formula would give the most accurate 
results. Moreover, the survival rate method and the vital statistics 
method may be used effectively in combination. If estimates of net 
migration at each age are being developed by use of a survival rate 
method for a particular area, for example, it is suggested that the im- 
plied numbers of deaths at each age be adjusted to agree with an esti- 
mate of the total number of deaths based on registrations, where possi- 
ble. Estimates for a group of smaller areas may similarly be “controlled” 
to the total for the larger area containing them. 

Finally, it should be noted that the forward survival rate formula has 
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certain advantages which wil! more than justify its continued use for 
many purposes: (1) it is simple and easy to use; (2) it is a time-saver 
and a work-saver; and (3) it measures rather accurately, for cohorts 
alive at the start of a period, (a) net population change attributable to 
migration and (b) net migration among the population cohorts alive at 
the end of the decade. 
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SURVIVAL CURVE FOR CANCER PATIENTS 
FOLLOWING TREATMENT 


JosEPH BERKSON AND RosBeErt P. Gaap 
Mayo Clinic, Rochester, Minnesota 


On the basis of experience with calculated survivorships of 
patients following treatment for cancer, a simple function, in 
terms of two physically meaningful parameters, has been 
evolved, which fits such survivorship data very well. These 
two parameters can be used to compare succinctly the mor- 
tality of two groups, different in respect of treatment, type 
of cancer, or other characteristics. 

The parameters are c (“cured”), which represents the pro- 
portion of the population which is subject only to “normal” 
death rates, and 8, which is the death rate from the cancer, to 
which the rest of the population [not “cured,” (1—c)] is 
subject. Thus if one treatment is characterized by c: =0.30, 
6: =0.25, another by c;=0.20, 6,=0.15, this could be inter- 
preted as meaning that while the first treatment “cured” a 
larger proportion of the population than did the second treat- 
ment, it did not ameliorate the deaths attributable to cancer 
in the patients not cured as much as did the second treatment. 

If lr is the proportion of the total population surviving to 
time t, then the function is lp =clo+(1—c)loe* where lo is 
the net survivorship corresponding to “normal” deaths, ob- 
tained from standard life tables. A graphic method and also 
a “least squares” method of estimating c and 8 are presented 
with an example, and the evaluated parameters are given for 
several series of treated cancer patients. Expectation of life 
and other functions of the life table also have been calculated 
from the evaluated parameters, for the same series. 


HYSICIANS have for a long time used the five-year survival rate as 
Pa index of the effectiveness of the treatment of cancer; it is fre- 
quently referred to by them as'the “five-year cure rate.” The idea be- 
hind this latter term, we may suppose, is that with so mortal a disease 
as cancer, those who survive for this length of time can be considered 
cured. But this rate cannot, of course, be identified as the proportion of 
patients cured, since it cannot be assumed that necessarily all patients 
if untreated would be dead within five years, nor conversely that 100 
per cent of noncancerous persons would be living after this period of 
years. From time to time one can find in the medical literature the re- 
flections of a feeling that this index is not entirely satisfactory. In par- 
ticular it is realized that the deaths during the five-year period will 
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include deaths due to other causes than the cancer in question, and it 
is considered not fair to “charge” these deaths to cancer. Conversely, 
however, it cannot be assumed that all deaths occurring after the five- 
year period are not due to the cancer; in specific cases, some deaths 
from the original cancer are known to have occurred many years after 
treatment. 

Different methods have been used for adjusting the rate as calculated 
from the data in hand, to take account of the “normal” deaths. One of 
these methods used by physicians is very simple and has a kinship to 
methods used to standardize crude death rates for the age and sex dis- 
tribution of the population studied. It consists in expressing the calcu- 
lated survival rate as a percentage of the survival rate for a normal 
population similar in constitution to the cancer population. Thus, if 
the calculated five-year survival rate for the cancer population is 60 per 
cent and in a comparable normal population the survival rate is 90 per 
cent, the rate is expressed as 67 per cent of normal. This simple empiri- 
cal adjustment can be given a theoretical meaning in terms of a model, 
which will now be developed. 

In Figure 1 are depicted the survival curves, calculated by elementary 
actuarial methods, from a large series of data referring to patients of 
the Mayo Clinic treated surgically for cancer of the stomach, these 
being subdivided according to histologic grade of malignancy. There is 
also shown in the figure a survival curve representing a general popula- 
tion of similar age and sex constitution, the last obtained from pub- 
lished United States Life Tables. Since the rates are plotted on a 
logarithmic grid, a ratio of rates corresponds to a difference on this 
scale. Attention is directed to the fact that the curve for the normal 
population is convex upward throughout its course; the cancer curves, 
on the other hand, are convex downward for a period of about eight 
years, and then become roughly parallel with the normal curve. We 
may give an interpretation to this pictorial representation. 

Since the survivals are logarithmically plotted, the slope represents 
the proportional decrement by death, which is to say the instantaneous 
death rate, sometimes referred to as the “force of mortality.” There- 
fore, the upward convexity of the curve for the general population re- 
flects the steadily increasing death rates with advancing age character- 
istic of the general population in the adult ages. The death rates for 
the cancer population on the other hand at first decrease, as evidenced 
by the decreasing slope, and then, as evidenced in the parallelism with 
the normal curves, attain approximate equality with normal death 
rates, 
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CANCER OF THE STOMACH 
CORRELATION OF SURVIVAL AND GRADE OF MALIGNANCY (BRODERS) 
100 
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Ficure 1. Survival curves for patients with cancer of the stomach treated by 
gastric resection at the Mayo Clinic; 2,682 cases subdivided according to his- 
tologic grade of malignancy. There is also shown a survival curve for a normal 
population of similar age and sex constitution. To be noted is the upward convex- 
ity of the normal survival curve throughout its course, reflecting the increasing 
death rates with advancing age of a normal! population. In contrast, the curves for 
the cancer cases show at first a downward convexity reflecting a decrease of the 
death rate until about the eighth year after operation, when the curves become 
approximately parallel with the normal curve. (From Berkson, Joseph, Walters, 
Waltman, Gray, H. K. and Priestley, J. T.: Mortality and Survival in Cancer 
of the Stomach: A Statistical Summary of the Experience of the Mayo Clinic. 
Proceedings Staff Meeting, Mayo Clinic. In press.) 


The fact that the death rates for the cancer groups, after a certain 
period of time, approach the normal rates, conveys an obvious sug- 
gestion: that a certain fraction of the cancerous population are dying 
at normal death rates, while the rest are subject to higher rates. As the 
latter group is depleted by death, the population that is left approaches 
the normal, and the predominant picture is that of a normal survivor- 
ship. 

Suppose we follow this idea for the formulation of a model. We shall 
call the fraction of the population that is subject to normal death rates 
the fraction “cured,” adopting a suggestion derived from the formula- 
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tion of Boag [1]. We shall assume that patients with a specified cancer 
are, before treatment, all subject to the effect of two mortality forces, 
dca Tepresenting that for the cancer in question and go representing all 
other diseases, and that these act independently and simultaneously. 
After treatment a fraction c of the population is “cured” and subject 
only to the mortality forces go, while the remainder (1—c) is subject 
to two forces, go as before and qs, the value of q.4 being not necessarily 
equal to, and presumably less than, before treatment. 

We may be sure that these assumptions oversimplify the facts, that 
the presence of cancer influences the probability of death from other 
causes, that there are specific seasonal characteristics of the different 
causes of death so that they do not operate with strict simultaneity, 
and that the effect of treatment on mortality is more complicated than 
the sharp dichotomization pictured. But it appears that these com- 
plexities do not disturb too violently the effective use of the simplified 
model, as will be evidenced later. 

If we think of the two hypothesized cohorts of the population sepa- 
rately, then the probability of survival to a time ¢ of the cured cohort 
which is subject to a succession of death rates po = 1—go is given by the 
continued product /lo=IIpo;* for the uncured cohort it is given by 
lolea, Where la=Upea and Pra=1—Gea. The probability of survival to 
time ¢ for the total population is then 


(1) lp = clo + (1 — c)lolea. 


If we express ly as a fraction of the normal survival lo following the 
method of standardization mentioned, we have 


(2) V’ = Iz/lo = c + (1 — C)lea. 


The equation (2) can be interpreted as giving the probability of sur- 
vival in the total population, if it is hypothetically freed of deaths from 
other causes [2];? in such a population the cured fraction would never 





1 Bee Appendix Note 1. 

* In order to avoid complexity of subscripts, symbols qo, @ea, lo, Ica, and so forth, are used loosely to 
represent the probabilities concerned in general and also their specific values ».: time ¢. 

3 If the probability of death in a population subject only to deaths from a specific cancer in the ab- 
sence of all other causes of death is g-a =1—p;a, and the probability of death in a population subject 
only to death from other causes is go = 1 —po, then the probability of survival with both causes effective 
and acting independently is pr =peapo, and hence pea = pr/po. lf I’ is to be interpreted as pra, therefore, 
lo should represent the probability of survival in the population with all other causes but the specific 
cancer operating. Strictly, the probability of survival in the general population should therefore be cor- 
rected for deaths from the specified cancer before division into ly, that is, the division should be by the 
quantity lo/II(1 —@cea.n) where @ca.n is the death rate from the specific cancer in the general population. 
However, the death rate from a specific cancer is only in the order of 1 to 2 per cent of the death rate 
from all causes, and the effect of the correction on, say, the five-year or ten-year survival rate would only 
be a fraction of 1 per cent and is quite negligible. 
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die. If we visualize the survival curve (2) as extending to the point 
where the uncured fraction is practically entirely depleted, then [.. is 
equal to zero and l’=¢. We may now recall the survival curves of 
Figure 1, and interpret the appearance of parallelism of the logarithmic 
survival curves of the cancer and normal populations. This is the part 
of the curve where the ratio of the cancer survival rate to the normal 
survival rate—that is to say, l’—is almost constant, and hence the 
distance between the logarithmic curves in the portion where they ap- 
proach parallelism represents an approximation of the logarithm of c, 
the fraction cured. 

We wish to represent the model relation (2) by a functional equation. 
The value of lo is the survival rate for a comparable normal population, 
and this we obtain from the published life tables which we shall adopt 
for the standardization.‘ In addition, we shall need, besides the value 
of c, to represent /.4 functionally. The value of l.4 represents the sur- 
vivorship in a hypothetical population subject only to death rates from 
the specified cancer, and we shall have to make some assumption re- 
garding the instantaneous death rates in the portion of the population 
dying from cancer. The simplest assumption is that this instantaneous 
rate of risk is a constant. This is the assumption frequently made for 
short periods, in the calculation of net from crude rates, where competi- 
tive rates of decrement are involved, as illustrated in the formulations 
of Greville [3] and those of Neyman [4]. Iu the present situation we are 
boldly making this assumption to apply over the entire life span, in 
respect to the net risk of death from cancer, in the uncured cohort. If 
we symbolize the instantaneous rate as , then /.4=e~*' where ¢ repre- 
sents time, and we have for (1) 


(3) lr = clo + (1 — e)loe*. 


There are only two adjustable parameters: c, representing the frac- 
tion cured; and 8, the instantaneous risk of death from cancer. These 
were estimated by fitting (3) to the actuarially determined values of 
lr by least squares, minimizing the squared residuals, unweighted. The 
method of calculation, together with an example, is presented in Ap- 
pendix Note 3. 

The curve (3) was fitted to several large series of cases for which we 
have an extensive and almost complete follow-up and to some data of 
Boag [1]. These are shown in Figures 2, 3, and 4. It is seen that the curves 
fit the observations quite satisfactorily. There is an appearance of sys- 
tematic deviation in some parts of the different curves, but these are 





4 See Appendix Note 2. 
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Figure 2. Survival curves for patients with cancer of the breast treated by 
radical mastectomy at the Mayo Clinic; 6,426 cases, subdivided according to the 
absence or presence of metastasis. The points are the actuarially calculated sur- 
vival rates. The smoothed curves represent the fitted function, the equations of 
which are given in the figure. Values of lo correspond to survival rates for 1929- 
1931 beginning at age 52 years, Table 1B, United States Life Tables [5]. 


not consistent for the various series and in general the points fall along 
the curves very well. We were, in fact, as surprised as we were gratified 
to find that a curve with only two adjustable parameters, both physi- 
cally meaningful in terms of the model used, fitted so well such a com- 
plicated course of events as is involved in the survivorship over a long 
period of time of a cancer population. 

With the entire course of mortality established in the known equa- 
tion, other functions of the life table can be calculated, notably the ex- 
pectations of life. An example is shown in Table 1 for the Mayo Clinic 
series of cancer of the breast with metastasis. The quantities calculated 
furnish a simple type of population analysis that is interesting. We see 
that the expectation of life of the total population actually increases for 
a certain number of years, until the cancerous cohort of the population 
is depleted to the point at which the population left is so largely normal 
that the characteristic of the normal life table in the adult years, of a 
decreasing expectation, exhibits itself. This is rather the reverse of 
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Ficure 3. Survival curves for patients with cancer of the stomach treated 
by gastric resection at the Mayo Clinic; 3,963 cases, subdivided according to 
the absence or presence cf metastasis. The points are the actuarially calculated 
survival rates. The smoothed curves represent the fitted functions, the equations 
of which are given in the figure. Values of /p correspond to survival rates for 1929— 
1931 beginning at age 56 years, Table 1A, United States Life Tables [5). 


what is found in some other diseases, such as diabetes [7], in which the 
probability of death from the disease increases with passage of time. 

The two parameters c and 8, together with the ratio of expectations 
of life of the cancerous to the normal population, give a succinct sum- 
mary of the mortality of the cancer patients following treatment, and 
should serve as more comprehensive indexes of the effectiveness of 
treatment than the crude five-year survival rate. In Table 2 these con- 
stants are shown for illustration applying to the several series already 
referred to. 


APPENDIX NOTE 1 


We are indebted to the publication of Boag [1] for this as well as 
other valuable suggestions. However, our formulation has been in the 
process of development for a long time, on the basis of study of the 
many survival curves for cancer patients which we have calculated, and 
our work was rather differently motivated from Boag’s. We desired to 
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Ficure 4. Survival curve for patients with cancer of the breast; 121 cases, 
data of Boag. The points are the actuarially calculated survival rates. The 
smoothed curve represents the fitted function, the equation of which is given in 
the figure. Values of lo correspond to survival rates for women beginning at age 
59 years, English Life Table No. 10, 1930-1932 [6]. 


delineate comprehensively the entire survival curve of cancer patients, 
rather than specifically to estimate the fraction of such patients 
“cured.” Although the model we arrived at turned out to be very simi- 
lar to, if not quite identical with, Boag’s, the present authors alone are 
responsible for the specific formulation of the problem and the method 
of its solution as here presented and for the extension into the calcula- 
tion of expectations. The word “cured” should not be interpreted other- 
wise than in terms of the hypothetical model. The parameter c is an 
estimate of the proportion of the treated patients who live out a normal 
expectancy of life. Assuming the estimate to be fairly accurate, it still 
may be argued that all or a portion of them would have lived as long 
as they did, even if they had not been treated. Perhaps the term should 
be “cure index.” 

Park and Lees [8] have objected strongly to the use of the five-year 
survival rate as a measure of the cure rate, but they propose no accepta- 
ble alternative method of calculation. They say, “The real cure rate 
(in this series) is represented by the difference between the five-year 
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TABLE 1 


SURVIVALS AND EXPECTATION OF LIFE: CANCER OF THE BREAST 
WITH METASTASIS. DATA OF THE MAYO CLINIC 




















Hypothetical population of 1,000 treated patients 
Years Cured Not cured Total population 
after 
treat- Num- | Expecta- | Num- | Expecta- Num- Propor- Expectation 
ment ber tion, ber tion, ber tion Ratio to 
$ living years® living years living cured, Years ca 
Ni és Ni é, Ne per cent é, per cont 
0 200 21.86 800 3.11 1,000 20.0 6.86 31.4 
1 198 21.09 580 3.10 778 25.4 7.68 36.4 
2 195 20.34 420 3.09 615 31.7 8.56 42.1 
3 193 19.60 305 3.08 498 38.8 9.48 48.4 
4 190 18.86 220 3.07 410 46.3 10.39 55.1 
5 187 18.14 159 3.05 346 54.0 11.21 61.8 
6 184 17.43 115 3.03 299 61.5 11.89 68.2 
7 181 16.73 83 3.01 264 68.6 12.42 74.2 
8 178 16.05 60 3.00 238 74.8 12.76 79.5 
9 174 15.37 43 2.97 217 80.2 12.91 84.0 
10 170 14.71 31 2.96 201 84.6 12.90 87.7 
ll 166 14.07 22 2.94 188 88.3 12.77 90.8 
12 162 13.43 16 2.90 178 91.0 12.48 92.9 
13 157 12.81 ll 2.87 168 93.5 12.16 94.9 
14 152 12.21 8 2.85 160 95.0 11.74 96.2 
15+ 147 11.63 6 2.82 153 96.1 11.28 97.0 





























* Values of lo and expectations for the cured fraction of the population as for United States white 
females, continental United States, 1929 to 1931, Table 1B, beginning at age 52 years [5]. For cured 
fraction of the population Nz = 1,000 clo; for fraction of the population not cured N; =1,000 (1 —c)loe,. 
N; was calculated in each case for successive values of ¢ till N; =0, but is given in the present table only 
to t=15. Expectations for the fraction of the population not cured and for the total population were 
calculated from the approximate formula é;=(1/N;)=N;—0.5 where the summation of N; is from ¢ 




















to N; =0. 
TABLE 2 
SUMMARY CONSTANTS FOR SEVERAL SERIES OF DATA 
Five- Net annual Expectation 
Data year Cure death rate 
5 survival index on, Years Per cent of 
rate c l-e normal 
Cancer of stomach, Mayo Clinic 
Without metastasis 0.48 0.46 (.30 9.4 54.4 
With metastasis 0.19 0.17 0.44 4.4 25.2 
Cancer of breast, Mayo Clinic 
Without metastasis 0.78 0.64 0.10 16.6 75.8 
With metastasis 0.34 0.20 0.27 6.9 31. 
Cancer of breast, data of Boag [1] 0.41 0.27° 0.23 7s 42.1 




















* Boag estimated 0.26 by his method. 
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survival rate of all the cases after treatment and what would have been 
the five-year survival rate of those same cases if they had not been 
treated.” It is impossible to observe directly what would have been the 
survival rate had treated patients not been treated, but even if this 
could be ascertained indirectly the difference of five-year survival rates 
would not represent the proportion of patients cured. If, to take a hy- 
pothetical example, untreated patients with cancer had a five-year 
survival rate of 15 per cent and an exactly similar normal population 
had a survival rate of 85 per cent, then if some treatment effected a 
100 per cent cure of the cancerous patients, the survival rate would be 
85 per cent and the difference between survival rates with and without 
treatment would be 70 per cent, not the 100 per cent which hypotheti- 
cally were cured. Even if it is hypothesized that untreated patients 
would necessarily have a zero five-year survival rate—an assumption 
which is not warranted in general—the difference would be only 85 per 
cent. Park and Lees do offer a “suggested index of curability of cancer 
of the breast” which they express as: 

x/y=0.83 for the five-year survival rate where z is the percentage of 
patients dying within five years, y is the percentage of patients with 
axillary metastasis, and 0.83 is the standard by which curability is to 
be measured. How 2z/y can be applied to measure the cure rate in cases 
with metastasis or cases without metastasis, or why it should measure 
the percentage cured when applied to a series of cases which contains 
both, is not evident. 


APPENDIX NOTE 2 


The point has been made that life tables for a general population 
may be unrepresentative of the population in hand, so that if, for in- 
stance, the general death rates in the population under survey are actu- 
ally higher van those of the life table used, l’ will be overestimated, and 
if they are lower, l’ will be underestimated. Actually, we use the best 
established life table available, as closely comparable in time, and in 
sex, age, and race constitution as we can locate. We have found that, 
whichever we use of several available life tables that appear reasonable, 
there is not much difference in the estimates. One can always try 
another proposed life table than the one used for standardization, if it 
appears more appropriate. _ 

It has been suggested that the death rates from other causes be de- 
termined from the cancer population being studied, by establishing for 
each recorded death whether it was from cancer or from other causes. 
This, it seems to us, is a very unreliable basis for the accounting of 
death rates from other causes. The determination of whether a death is 
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entirely due to cancer or entirely due to other causes is difficult to 
establish, if indeed it is even possible to define precisely. Actually, in 
most cases it is impossible to establish unequivocally, and rules more 
or less arbitrary have to be used to guide the certification. For instance, 
Boag [1], who uses assigned causes, assigns operative deaths, or a death 
the immediate cause of which is clearly other than cancer—say, suicide 
or vehicular accident—to cancer rather than to other causes, if at the 
time of death there was evidence that cancer was present. This he does 
because it seems unreasonable to credit as representing a cancer “cure” 
a patient who still has cancer, even if it was not the cancer that killed 
him. The procedure is reasonable from that viewpoint, but it is arbi- 


trary. Another individual assigning causes might take a different view 


and assign cause as found in the record, or interpret the record differ- 
ently, or insist that a necropsy should have been performed if certifica- 
tion of absence of cancer is required. If a large series of cases is involved, 
it is almost certain that at least a considerable number will be actually 
undeterminable as to cause of death, and the disposition of the recorder 
will be the deciding factor as to how these will appear in the statistical 
tabulation. Experience has shown that several physicians will produce 
different distributions of causes of death from the same medical records. 
Such differences between equally authoritative assessments of the cause 
of death produce larger differences of estimates than does the use of one 
or another life table. 

It would appear to be a matter of elementary interest to compare the 
deaths reported as from other causes than the cancer, with the number 
expected on the basis of some appropriate general life table. So far as 
the deaths attributed to other causes exceed normal deaths, even if 
these have been authentically recorded as to cause, it seems “fair” to 
account for the excess as deaths attributable to cancer. If we allow them 
to be included in the calculations as from other causes, then a patient 
who dies of pneumonia contracted because of a lowering of resistance 
brought on by the cancer or its treatment is as effectively counted as 
representing a “cure” as is a patient who lives out a normal expectancy, 
and the way is open to the achievement of a large “cure” rate by at- 
tributing an inordinately large number of deaths to other causes. 

For these reasons the senior author [9] has urged for many years that 
in calculating survival rates for cancer patients, if allowance is to be 
made for deaths due to other causes, this should be done, not on the 
basis of cause of death observed ad hoc in the population at hand, but 
from rates given in designated tables representing the general popula- 
tion. Such adjusted rates can be considered “standardized” on the basis 
of the life table used, rather than as carrying the implication that the 
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life-table rates ace assumed to represent literally the death rates from 
other causes in the population at hand. 


APPENDIX NOTE 3. METHOD OF FITTING FUNCTION (3) 


The function is 
(3) lr = clo + (1 — c)loe**. 
There are two parameters to be estimated, c and 8. Expanding (3) ina 
Taylor’s series, retaining only the first power terms 


alr alr 
(4) lp = colo + (1 — co)loeo' + —— Ac 4+ —— AB 
0Co OBo 
where Co, Bo are provisional estimates of c and 8, and Ac and Af are 
corrections to these, which are to be determined. 

Values for ly are determined, using an actuarial method such as we 
have described [10]; and with a schedule of values for lo obtained from 
some general life table which is considered applicable to the population 
at hand, the successive values of l’=l;7/lo are obtained. A provisional 
value of c is taken, and subtraction of this from the successive values 
of l’ gives a series of values which, plotted on arithlog paper against t, 
should be linear. A line is fitted graphically to the points, intercepting 
the l’ scale at (1—c), and the slope of this line will provide a provisional 


value for 8. Usually for this preliminary fit we do not plot the points 


TABLE 3 


CANCER OF THE BREAST WITHOUT METASTASIS, DATA OF THE MAYO 
CLINIC CALCULATION OF l’ FOR GRAPHIC SOLUTION 
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* From United States Life Tables, Department of Commerce, Bureau of the Census, Washington, 
D. C., 1936, Table 1B, white females 1929-1931, beginning at age 52 years. 
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beyond ten years following treatment. If the plot of l’—c on arithlog 
paper appears to have curvature, another value of c may be tried until 
a fairly linear appearance of the trend of points is obtained. The values 
of c and @ estimated in this way may be satisfactory for most practical 
purposes. A more definitive estimate can be obtained by a least-squares 
procedure as follows. 

Using the values of c and 8 obtained by the graphic fit as provisional 
values, symbolized respectively as co and fo, calculate for each value of 
lr the following working values: 


(5) Y=ly—- loles +(1- Co) eet | 


al 
X, = — = loft — o*'] 
dCo 


dlr 
(7) X2 = — — = Io[(1 — coe]. 
OBo 
The equation for minimization of the summed squared residuals of Y 


is then 
(8) Y = AcX,; — ABX:2. 
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Ficure 5. Cancer of the breast without metastasis; data of the Mayo Clinic. 
Plot of i’ —c versus ¢ with c taken as 0.60. A straight line is fitted to the points by 
eye, intercepting the I’ ordinate scale at (1—c). The slope gives an estimate of 
By =0.10. 
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TABLE 4 


CANCER OF THE BREAST WITHOUT METASTASIS, DATA OF THE MAYO CLINIC 
ESTIMATE OF ¢ AND 8 
ee=0.60 Bo=0.10 Y=lp—lofert(i—ede Pe] Ximlo(i—eF#) — Xr=lo[(1—eve Pot] 
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2X;? =3.122083 DX? =17.654446 2XiX.=7.274147 
X.Y =0.012647 rX:Y= 0.018842 


0.012647—3.122083Ac+ 7.274147A8=0 
0.018842—7.274147Ac+17. 65444648 =0 


Ap=0.015055 8=0. 115055 
Ac =0.039129 ce =0.639129 


Second iteration: 
AB=—0.000438 B=0. 114562 
Ac =—0.005874 ¢ =0.633126 


As=—0.000448 B=0. 114552 
Ac =—0.000094 ¢=0.633094 


The normal equations to be solved are 





5 These equations minimize the sums of the squared residuals of the actuarially calculated values 
of ly from the fitted function (1). This procedure does not yield a least-squares solution in the proper 
Gaussian sense. For such a solution, the observations, the residuals of which are minimized, must be in- 
dependent, and weighted in reciprocal to their sampling variances. Obviously the calculated values of 
lp do not meet these requirements. This does not mean that the present method will not give a good fit; 
it may in fact provide better estimates than those given by methods which have been advanced, such 
as the method of maximum likelihood. It has been found, for instance, that in some cases. application of 
the principle of the minimum x? yields estimates which have smaller sampling error than the maximum 
likelihood estimates [11]. Brown and Flood [12] used the principle of minimum x? for estimating param- 
eters of the mortality curve of tumbler breakage. The “least-squares” procedure outlined in this paper 
is much simpler to apply than maximum likelihood, and it apparently provides a good approximation 
of minimum ,?, since for some cases in which the maximum likelihood estimates were available, we cal- 
culated the parameters by the present method and found that the x? was consistently lower for these 
estimates. It is planned to investigate this problem further, together with the question of the theoretical 
sampling errors of the estimates, 








f 


h 
f 
a 
’ 
d 





SURVIVAL CURVE FOR CANCER PATIENTS 515 
0) DYX, — Acz Xi + ABE XX: = 0 
(10) | > YX: — Acd XX. + ABE X,? = 0. 


The values of c and 8 obtained in this way may be used as new pro- 
visional values Co and fo to obtain second estimates. For definitive esti- 
mates we continue iterations until constancy in the second decimal 
place in the estimate of both parameters is obtained. 

An example of the calculations for data of the Mayo Clinic referring 
to cancer of the breast in females follows in Table 3, Figure 5, and 
Table 4. 


REFERENCES 


[1] Boag, John W., “Maximum Likelihood Estimates of the Proportion of Pa- 
tients Cured by Cancer Therapy,” Journal of the Royal Statistical Society, 
11 (1949), 15-53. 

{2] Karn, M. N., “A Further Study of Methods of Constructing Life Tables 
When Certain Causes of Death Are Eliminated,” Biometrika, 25 (1933), 
91-101. 

[3] Greville, T. N. E., “Mortality Tables Analyzed by Cause of Death,” The 
Record, American Institule of Actuaries, 37 (October, 1948), 283-94. 

[4] Neyman, Jerzy, First Course in Probability and Statistics. New York: Henry 
Holt & Company, 1950. 

[5] United States Life Tables. (Prepared by Dr. J. A. Hill.) Washington, D. C.: 
United States Government Printing Office, 1936. 

[6] Watson, A. W., Life Tables, The Registrar-General’s Decennial Supplement, 
England and Wales, 1931, p. 7, London, 1936. 

[7] Berkson, Joseph, Gage, R. P., and Wilder, R. M., “Mortality and Longevity 
Among Patients With Diabetes Mellitus,” Proceedings of the American 
Diabetes Association, 7 (1947), 133-44. 

[8] Park, W. W., and Lees, J. C., “The Absolute Curability of Cancer of the 
Breast,” Surgery, Gynecology and Obstetrics, 93 (1951), 129-52. 

[9] Berkson, Joseph, ‘“The Calculation of Survival Rates.” In Walters, Walt- 
man, Gray, H. K., and Priestley, J. T., Carcinoma and Other Malignant 
Lesions of the Stomach. Philadelphia: W. B. Saunders Company, 1942, pp. 
467-84. 

[10] Berkson, Joseph, and Gage, R. P., “Calculation of Survival Rates for Can- 
cer,” Proceedings of the Staff Meetings of the Mayo Clinic, 25 (May 24, 1950), 
270-86. 

[11] Berkson, Joseph, “Relative Precision of Minimum Chi-square and Maxi- 
mum Likelihood Estimates of Regression Coefficients,” In Neyman, Jerzy, 
Proceedings Second Berkeley Symposium on Mathematical Statistics and 
Probability. Berkeley and Los Angeles: University of California Press, 1951, 
pp. 471-79. 

[12] Brown, G. W., and Flood, M. M., “Tumbler Mortality,” Journal of the Amer- 
tcan Statistical Association, 42 (1947), 562-74. 





AN APTITUDE TEST FOR PREDICTING SUCCESS IN 
A FIRST COURSE IN STATISTICS* 


D. F. Votaw, Sr. 
Southwest Texas State Teachers College, San Marcos, Texas 


Steps in the development of an aptitude test for the pre- 
diction of success in a first course in statistics are described in 
this paper, with stress placed upon actual success as the cri- 
terion of validity in selecting test items. A brief description of 
techniques, as well as a few typical items, is presented. 


INTRODUCTION 


ERHAPS most of the members of the American Statistical Associa- 

tion and the Biometrics Society are interested primarily in dealing 
with statistical talent after it has already been discovered, is somewhat 
developed, and is well on the way to useful service. Much earlier than 
this, however, someone must deal with young people who are still un- 
certain of their talents and still trying to orient themselves to the com- 
plexities of industry, the sciences, and the professions. There is much 
evidence that many potential statisticians have been lost in the con- 
fusion of unsystematic methods (or no methods) of dealing with them 
in the early stages. 

Many students at the sophomore level in undergraduate college are 
very earnest about discovering their powers and their limitations. At 
the same time they often resent anything that resembles the pigeon- 
hole variety of guidance. The guidance activities at this level should 
attempt to secure the student’s confidence in measuring instruments 
applied to him and to provide the student with interpretations of the 
results, including liberal tolerances. 

Cronbach [1] defines capacity as, “ . . . the person’s hypothetical po- 
tentiality or acquiring a reaction, with training.” Freeman [4] is more 
specific in his statement, “An aptitude is a condition or set of character- 
istics indicative of an individual’s ability to acquire with training some 
specific knowledge, skill, or set of responses.” The experience of Dailey 
[2] in the development of the airman classification test battery led him 
to be even more specific in defining an aptitude in the statement, 
“...each task requires its own unique pattern of aptitudes to learn 
it.” 

The purpose of this study was to investigate at Southwest Texas 





* Presented at a joint meeting of the Biometrics Society, Eastern North American Region, and the 
American Statistical Association at Boston, December 28, 1951. 
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State Teachers College the possibility of developing a measure of apti- 
tude for achievement in a first course in statistics that would be predic- 
tively superior to measures already recorded on the permanent records 
of prospective students. At the time this study was begun, measures 
available were test scores on ACE quantitative, ACE linguistic, vo- 
cabulary, reading speed, reading comprehension, knowledge of use of 
library, and algebra. The sum of these variables, combined according 
to best weights, gave a correlation of +.51 with the final test score on 
the first course in statistics. 


PROCEDURES AND RESULTS 


The procedure adopted in the search for a more highly predictive 
measure was to make up a preliminary list of sixty-five test items with 
special attention being given to types of performances that arise in 
simple statistical processes. The items were then arranged in tentative 
order of assumed difficulty. The items were then administered to a 
try-out group of 176 sophomore students—both sexes—as a pre-test 
in a first course in educational statistics. Liberal time for each student 
to try all items was allowed. 

A common method of determining item validity at this point is to 
compare the proportion of correct responses to each item by a group 
that has performed poorly on the test with the proportion of correct 
responses to the item by a group that has performed well on the test. 
It is often necessary to resort to this method in building a test but it has 
the obvious weakness of utilizing the test being validated as its own 
criterion. The method is merely a check on internal consistency of 
items. In the case at hand a better criterion was available—the final 
success of 171 students at the end of the course. (Five had dropped out 
for various reasons.) The pre-test papers were laid aside, therefore, 
awaiting the end of the course. 

During the semester, seven objective tests on the subject matter of 
the course were administered and the total of the seven test scores for 
each student became the measure of his suecess in statistical achieve- 
ment. These scores ranged fairly normally from 12 to 122 with a mean 
of 68.27 and a standard deviation of 24. 

Two widely-spaced performance groups were then segregated (see 
Kelley [6]) consisting of the high twenty-seven per cent and the low 
twenty-seven per cent, each being comprised of forty-six students. The 
original pre-test papers of these students were then analyzed item by 
item for a comparison of group successes. 

A few of the items were answered correctly by a larger proportion of 
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the low group than the high group. One such item, for example, was 
item 26: 
Chapter IT of a certain book includes pages 32 to 59. If half the chapter is 


assigned, to what page must the class read? Determine this by the most 
economical process. Show the process in the box at the right. 


An answer was marked wrong, even though the correct page was 
given, unless the process of adding 31 and 59 followed by dividing the 
sum by 2 was employed. The fault in this item became visible after 
comparing responses of good and poor students; uncertainty was raised 
in the minds of many good students as to whether the computed mid- 
page, 45, was the desired answer or page 46, the one read to but not 
included in the reading. Needless to say, this item and three additional 
ones found to be adversely selective were discarded since they would 
tend to give poor students score-point advantages. 

It should not be assumed, however, that all items on which upper 
students did better than lower students were selective. Final evidence 
of the selectivity of an item will be found if the proportion of upper 
students responding correctly to it exceeds the proportion of lower stu- 
dents responding correctly by an amount that cannot be explained 
readily as having occurred by chance. The inspection of each item re- 
quires determination of the standard error of each proportion, determi- 
nation of the difference between the two proportions, and determination 
of the standard error of this difference. If the difference amounts to as 
much or more than two times its standard error, the item may be said 
to be selective with about ninety-five per cent confidence. 

The foregoing steps may be consolidated into a general equation as 
follows (see Votaw [7]): 


Pi Prt 
SE,, = = and sE,, = 4/". 


Therefore, 


Pit 


P2q2 _ 


SE (1-7) = + 


N N 
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Substitute 1— 7p: for gq: and 1— pz for g2 and solve the equation for p1. 


2m+— + 4/ (+) - m+ 
ait G 9 alta 


| ee 2 
1 oni 
(1+ 55) 


In this equation 








p1 is the proportion of upper-group successes, 
Pp: is the proportion of lower-group successes, 

N is the number of students in each group, and 
k is the arbitrary minimum critical ratio. 


When the N value (here twenty-seven per cent of 171, or 46) and 
the k value (here an arbitrary 2) are substituted in the general equation, 
a special-case equation emerges, 


_ 2ps + .086957 + +/-725000(p2 — ps") + 007561 
7 2.173914 y 


the graph of which provides a convenient means of subjecting each 
item to the test of selectivity. The graph, reproduced herein on a small 
scale, should be entered from the left margin with the number of lower- 
group successes on an item. The exit point at the bottom will then indi- 
cate the minimum upper-group successes that will be required to render 
the item acceptable. An item-analysis chart [3] may be used with ap- 
proximately the same results. 

The graph located forty-six items that were definitely selective, 
fifteen that were non-selective, and four that were adversely selective. 
Among the selective items were some that barely passed the minimum 
arbitrary standard and some that showed very high discrimination 
between students who later succeeded well and those who succeeded 
poorly. A few examples of highly discriminating items are presented 
below: 








Pr 


HIGHLY SELECTIVE ITEMS 


3. Subtract 47.28 from 29. 
9. /.0064 =? 

28. Professor Subscript decided to use the most economical plan 
possible to identify his children without resorting to customary 
names. He used B for boy and G for girl with numerical sub- 
scripts affixed for identification. The order of births of his six 
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children was: boy, boy, girl, boy, girl, girl. (a) What did he call 
the fourth child? (b) The fifth? 
Use the following information to answer questions 41 to 45: 
>; is a summation symbol. It is read the sum of 
N refers to the number of items of data 
X refers to a raw measure in one group of measures 
Y refers to a raw measure in a second group of measures. 
For example, X: 12, 8, 16, 4, 10 
Y: 5, 7, 3, 6, 4, 5 
Use the data of the example to make proper substitutions in the 
generalized expressions below and in each instance reduce the substi- 
tuted values to a single quantity. 
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44, Vi Y) 45. N, ? 
47. Study the two situations described below before attempting to 
answer the question. 

Situation A: The daily minimum temperatures for a year (364 
readings) have been tabulated (364 tally marks) 

Situation B: For the same period described in A, the weekly 
averages (52 averages of seven minimum temperatures each) 
have been tabulated (52 tally marks) 

Which, if either, of these tabulations will have the wider dis- 

persion of tally marks? Use one of the following answers: A>B, 

A=B, or A<B. 


These reveal the nature of some of the abilities of acquired knowledge 
that appears to be required for success for the beginning study of sta- 
tistics. Items 3, 9, and 28 appear to represent knowledge that the stu- 
dent has had ample opportunities to acquire. Items 41 to 45 may test 
the ability of the student to make mental adjustments to situations 
that are novel to him. Item 47 appears to reach a power of abstract 
thinking required to visualize the narrowing of variability when means 
of samples from a population are considered. 

Not all of the non-selective items were discarded, however. Even 
though such items have doubtful power of discriminating in favor of 
good students, some of the easier ones may be used for test motivation 
and some of the more difficult ones for perfecting the scale of difficulty. 
Actually discrimination is seldom found in extremely easy items and is 
often not found in extremely difficult ones. 

In most testing, and especially in the aptitude testing concerned 
here, it is desirable to have a normal distribution of scores to insure 
equal discrimination throughout the length of the score scale. If there 
is a pile-up of items at a high difficulty level, there will be a converse 
pile-up of scores at a low score level. 

To determine the final difficulty of each item the responses of all the 
students—171 in number—were taken into account [5]. After the sur- 
viving items were tabulated on the basis of difficulty, withdrawals 
were made at scale positions of high frequency until forty remaining 
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items were shaped into a normal distribution of difficulty. These were 
arranged in order of difficulty and renumbered for final mimeograph- 
ing. 

It should be mentioned here that the validity of the scores made by 
171 students on the experimental test of sixty-five items as determined 
by the correlation of those scores with final course test scores was .63. 

When the shortened but refined aptitude test of forty items was 
given to seventy-six other students with a working-time limit of forty- 
five minutes the next semester as a pre-test to the same course, the 
validity determined similarly at the end of the course was found to be 
raised to .72. The reliability of this test as determined by the Spear- 
man-Brown step-up formula applied to the correlation coefficient of 
scores on odd numbered items against scores on even numbered items 
was .95. 

The regression equation for predicting success in the first course in 
statistics has greatest general usefulness if expressed in T-scores for 
both variables—aptitude measures and final end-course measures. The 
equation is 


Y = .72x + 14.00 + 4.65. 


For example, if a boy makes a T-score of 70 on the aptitude test he 
may be expected to complete the course with a T-score standing of 64 
which is high enough, even with the tolerance necessitated, practi- 
cally to insure good success for him. 


CONCLUDING STATEMENTS 


The simple-recall method of response met the requirements of the 
present users of the test who have been dealing with very limited 
groups. If mass use of it should be needed, a revision to multiple choice 
responses for convenience of IBM answer sheets might be found de- 
sirable. 

A few office managers who have used the test have reported to the 
writer that it has been helpful in selecting office workers whose duties 
include some simple statistical work. 

(The test is not copyrighted. The writer will furnish a copy upon 
request.) 
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POPULATION PROJECTIONS FOR SALES FORECASTING* 
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OPULATION projections are used for many purposes, but the practi- 
Pea uses generally have one objective in common—planning for the 
future. Many types of sales forecasting, in which the objective is to 
estimate the future demand and to plan the volume of future output 
of some commodity, have as one ingredient in the forecast an estimate 
of the size of the potential market in terms of the number of persons 
of a specified group.' Knowledge of just how a sales forecaster goes from 
population projections to forecasts of sales and of what ingredients 
other than population he puts into his forecasting equation is beyond 
the area of competence of most demographers. In fact, much or even 
most of the work beyond the stage of making the population projections 
(and even some of the latter work) is not made public because of compe- 
tition within a field of business. 

In this paper we propose to review the status of population projec- 
tions in the United States, with particular reference to the types of 
projections that are available and the types of problems in their use 
that would be faced by a sales forecaster. We make no attempt to pro- 
pose methods of converting population projections to sales forecasts. 
Our preoccupation in this article with the population component in 
sales forecasts should not be construed to mean that it is believed to be 
the sole element of importance or even necessarily the most important 
element. 

NATIONAL PROJECTIONS 


Projections of the total population of the United States have been 
made by various methods for more than a century, in fact since colonial 
days [1]. In the 1920’s and 1930’s, Warren S. Thompson and P. K. 
Whelpton of the Scripps Foundation for Research in Population Prob- 





* Presented before the Business and Economic Statistics Section at the 111th Annual Meeting 
of the American Statistical Association in Boston on December 27, 1951. It was not possible to take into 
account data and materials which became available after December 15, 1951. 

1 One indication of the fact that private business makes frequent use of population data in this 
connection is that it has been an important consumer of the population projections released by the 
Census Bureau in the period since the Census Bureau began publishing such figures. From 1943 to 1950, 
for example, more than a third of all letters answered by that Bureau supplying information on popula- 
tion projections were from private business and manufacturing firms. 
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lems developed a method of projecting the population that has come 
to be known as the “cohort-survival” technique [2]. This is the only 
method we will consider at the national level. It has been used for the 
last few decades in preparing the most widely accepted series of pro- 
jections and has been adopted as the method for national projections 
in the official releases of the United States Bureau of the Census [3, 4]. 
The method involves first projecting trends in age-sex-specific mor- 
tality and age-specific fertility, often separately for race and nativity 
groups. Then from a given current date, for which population estimates 
or counts are available, each (age-sex-race-nativity) group (cohort) of 
the population is subjected to the appropriate mortality and fertility 
rates for a particular subsequent period (usually one or five years), 
and a balance is taken for the terminal date of the period to get an 
estimate of the surviving population. This process is then repeated with 
the age-specific mortality and fertility rates for the next higher age 
group in the next one- or five-year time period, and so on. Some assump- 
tions are also required as to the volume and age-sex composition of 
net population change through international migration. 

The Scripps Foundation developed population projections with the 
cooperation of, or some participation on the part of, several federal 
agencies during the 1930 and 1940 decades. In 1941, the Census Bureau 
republished some national population projections that had been made 
by the Scripps Foundation. In 1945, cooperative work by the Census 
and Scripps was initiated that resulted in the issuance of a monograph 
presenting projections to 1975 [5]. The latest report on national pro- 
jections was issued by the Census Bureau alone in August 1950 [4].? It 
seems safe to conclude that the Census Bureau has now assumed a con- 
tinuing responsibility for providing national projections for dates ex- 
tending several years into the future and of revising its projections when 
necessary and feasible. The latest projections provide a high, medium, 
and low series for each year to 1960, by age and sex, and the Census 
Bureau has extended the projections of the total population given in 
the revort to 1975 for the Bureau of Agricultural Economics on an 
unofficial basis. They were based on the current estimates for July 
1949. Two years later, the medium projection of the total population 
departed from the current estimate by only a small amount (4, 6]. 
Since July 1951, the population has been increasing at a somewhat 
higher rate than implied by the medium series. This higher rate of 





2 Following the preparation of this paper, the Census Bureau issued & provisional revision of the 
projections of the total population of the United States given in the report mentioned. See Current 
Population Reports, Series P-25 , No. 58, April 17, 1952. 
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growth is due to an increase in births following the upturn in the mar- 
riage rate after the outbreak of hostilities in Korea. Further comments 
on the differences between current estimates and projections will be 
given later. 

Perhaps the most basic problem for the sales forecaster to consider 
in his use of population projections is the specification of the population 
that forms the potential market for his commodity. In times of war or 
partial mobilization, he must decide whether his consumers include 
only the civilian population, the civilian population and members of 
the armed forces stationed in the United States, or the total population 
of the United States, including troops stationed overseas. He must also 
decide whether his potential market is essentially one of individuals, 
as in the case of cigarettes and clothes, or one of families or households, 
as in the case of refrigerators and household furniture. 

If the market is one of individuals, it may be composed of only cer- 
tain groups, such as males or females only, certain age-sex groups, or 
groups classified according to other characteristics, such as marital 
status, country of birth (for the foreign-born), educational attainment, 
school enrollment, etc. Fortunately, the cohort-survival method pro- 
vides projections in each age-sex group of the population directly. Also, 
most age-sex groups can be estimated by this method rather closely. 
The precision of a projection for a group that is already born is much 
greater than for one not yet born, mainly because of the difficulty of 
predicting births. Thus, the projections for 1960 of any age group of the 
population 11 years of age and over are subject to error only in the 
assumptions as to mortality and foreign immigration between 1949 and 
1960, but not in the assumptions about fertility. For example, the latest 
Census Bureau projections for the group under 5 years of age in 1960 
differ by more than 50 per cent from the low to the high series, whereas 
the projections for the group 18 years and over differ by 5 per cent and 
for the group 15 to 19 years by only 13 per cent. Hence, the forecaster 
of sales in 1960 for false teeth, wheel chairs, beer, or magazines read 
only by adults can be provided appropriate population projections with 
a much smaller range of error than can the manufacturers of baby 
foods, Hopalong Cassidy belts, or toy guns. 

It is our observation that, fortunately, most sales forecasters are 
generally satisfied with projections that go no farther than about 10 
years into the future (in contrast with those who need long-run pro- 
jections for planning in such areas as reclamation). Since also many of 
their markets are not importantly affected by the size of the child 
population that is yet to be born in this 10-year period, many sales 
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forecasters concerned with markets composed of individuals may regu- 
larly expect to have at hand rather reliable national forecasts of the 
population in which they are interested. Projections for more distant 
dates will be available much less frequently, and will be less reliable 
than the short-run forecasts—especially the projections of totals but 
also the projections of those population groups already born. 

If the market is one of families or households, the aggregates that are 
relevant are not total population projections, but family or household 
projections. Less is available on projections of families than of popula- 
tion. During the decade of the forties, in addition to its annual series 
of current estimates of families beginning in 1944, the Bureau of the 
Census published projections of families on two occasions, the second 
report appearing in 1946 [7]. The latest current estimates relate to 
April 1951 [8]. 

Although new series of family projections may be issued from time 
to time, it is not anticipated that they will be available on a regular 
basis and sales forecasters needing such figures may have to develop 
them on their own from the available data. From a bench mark date 
when data on both families and population are available, there are 
several simple ways of proceeding to derive projections of families from 
population projections. In order of increasing complexity one might 
assume that: (1) the number of families would change at the same rate 
as the population; (2) the number of families would change at the same 
rate as the population 20 years of age and over; (3) the trend in mean 
size of family observed over recent decades would continue (and then 
one would divide the population projections by the projected mean size 
of family); and (4) the trend in the proportion of heads of families at 
each age would continue (and then one would multiply the population 
at each age by these proportions). Any refined projection of families or 
households would take into account other factors, such as the future 
marriage rate, shifts in age at marriage, the rate of doubling up or of 
undoubling of families, etc.; but for many sales forecasting problems, 
one of the simpler methods of going from population to families may be 
satisfactory. 

A special problem met by users of population projections is set in 
sharp relief by the fact that the same part of the Census Bureau that 
makes the population projections also issues the current estimates of 
the population. When the present catches up with the future, there is 
the problem of splicing them together. This problem is not eliminated 
when, because of excellent technical work or good luck, the medium 
series of projections correspond very closely with the current estimates 
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(or a decennial census). As we move in time from the date from which 
the projections “took off,” the range from “low” to “high” shown by 
the projections is narrowed to zero for all dates prior to and including 
the present by substitution of current estimates for the projections, 
However, for subsequent dates, unless some modification is made in the 
projections, the range remains as wide as it was when projected without 
the knowledge of the actual population change that has occurred be- 
tween the base date of the projections and the present. The problem 
becomes more complicated when the actual population change does 
not follow the medium projection closely. We shall illustrate two ways 
of “patching up” projections of the total population by use of current 
estimates for later dates and one way of “patching up” projections of 
the age distribution. 

In 1947, the Bureau of Agricultural Economics needed total popula- 
tion projections to 1975 for the United States for a study on long-range 
trends affecting American agriculture [9]. The Census Bureau’s current 
estimates were then running approximately 0.8 million above the 
highest series of projections then available and about 1.8 million above 
the medium series (assuming no net immigration after the base date). 
Each of these series was adjusted upward to allow for the greater than 
projected increase between July 1945 (the base date) and July 1947, 
the adjustment in each case being graduated to zero in 1975. 

The second illustration involves adjustment of all three series of the 
latest projections two or so years following their point of departure into 
the future. As noted above, for July 1, 1951, the medium series was 
very close to the current estimate, so that little or no adjustment ap- 
peared necessary. According to more recent indications, however, it is 
anticipated that as of January 1, 1952, the current estimate will fall 
about midway between the medium and high projections implied for 
this date. Revised projections to 1960 in all three series have been pre- 
pared and are shown in Table 1 along with the original projections for 
comparison. A revised medium projection for July 1, 1952, was first 
made by simple extrapolation of the monthly and annual changes ob- 
served up to November 1, 1951. Revised low and high projections for 
July 1, 1952, and revised projections in all three series for July 1, 1953 
to 1960, were then developed on the basis of the pattern of differences 
and changes from year to year in the annual percentage increases im- 
plied in the original population projections.’ (For further details, see 
the methodological note to Table 1.) 





3A somewhat different methodology was employed in preparing the revised projections of total 
Population issued by the Census Bureau in April. 
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Rough adjustments of age projections may be made by multiplying 
the original projections in a given age cohort by the ratio of the census 
count or the current estimate for the group to the original projection 


TABLE 1 


ILLUSTRATIVE REVISION OF THE PROJECTIONS OF THE TOTAL POPULATION 
OF THE UNITED STATES (INCLUDING ARMED FORCES OVERSEAS), 
FOR JULY 1, 1952 TO 1960, WITH LATEST CENSUS BUREAU 
PROJECTIONS FOR COMPARISON 








Revised projections Census Bureau projections* 





Low Medium High Low Medium High 
series series series series series series 





July 1, 1949 149,149 149,149 149,149f | 149,215 149,215 149,215 
July 1, 1950 151,689¢  151,689f  151,689T 151 ,382 151,836 152 ,056 
July 1, 1951 154,353t  154,353t  154,353f 153 ,273 154,179 154,930 
July 1, 1952 156 ,908 157 ,178 157 ,517 154 ,902 156 ,358 157 ,800 
July 1, 1953 159 ,082 159 ,744 160 ,616 156 ,246 158 ,340 160 ,632 
July 1, 1954 160 ,897 162 ,057 163 ,733 157 ,335 160,138 163 ,431 


July 1, 1955 162 ,353 164,109 166 ,688 158,176 161,748 166 ,179 
July 1, 1956 163 ,685 166 ,124 169,715 159 ,005 163 ,397 169 ,033 
July 1, 1957 164 ,824 167 ,986 172,681 159 ,758 164 ,973 171 ,862 
July 1, 1958 165 ,785 169 ,708 175 ,597 160 ,453 166 ,492 174,679 
July 1, 1959 166 ,573 171,299 178 ,468 161 ,097 167 ,966 177 ,492 
July 1, 1960 167 ,175 172 ,732 181 , 267 161 ,679 169 ,371 180 ,276 











* Figures from Bureau of the Census, Current Poputation Reports, Series P-25, No. 43. Following 
the preparation of this paper, the Census Bureau issued a provisional revision of these projections 
(Current Population Reports, Series P-25, No. 58, April 17, 1952) which differs from the revision given 
in this table. 

t Latest current estimate from Current Population Reports, Series P-25, No. 48 or No. 55. 

Methodological note: A revised medium projection for July 1, 1952, was obtained by direct extra- 
polation of the annual and monthly changes in the population observed up to November 1, 1951, and 
the implied percentage increase between July 1, 1951, and July 1, 1952, was then computed. Comparable 
percentage increases for the low and high series for this period were then obtained, respectively, by 
subtracting from this medium percentage increase one-half (only the half year from January 1, 1952, 
to June 30, 1952, to be predicted) of the difference between the low and medium percentage increases 
for 1951-52 in the original population projections, and by adding to the medium percentage increase 
one-half of the difference between the medium and high percentage increases in the original popula- 
tion projections. Then, the changes from year to year in the percentage increases in the original pro- 
jections were applied to the revised percentages for 1951-52, in order to obtain preliminary revised 
percentages for 1952-53 to 1959-60. These preliminary percentages ‘were then adjusted downward 
linearly to tie in with the original percentage changes for 1959-60. Finally, the adjusted percentage 
changes were applied in sequence to the current estimate for July 1, 1951. 

The procedure meets the conditions of (1) a smooth juncture of the projections with the current 
estimates, (2) a rather small range from the low to the high series in the 1951-52 period, (3) a somewhat 
narrower range from the low to the high series in 1960 in the revised projections than in the origina! 
projections, and (4) a greater upward revision in the low series than in the high series. 


at the census or current date. For example, the latest Census Bureau 
projections [4] for the groups 20 to 24 years in 1955 and 25 to 29 in 
1960 may be adjusted by the ratio of the final census figure for the 
group 15 to 19 years to the projection for that group in 1950. 
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PROJECTIONS FOR GEOGRAPHIC AREAS WITHIN 
THE UNITED STATES 


For many types of commodities, the market is not nation-wide but 

is restricted to some geographic area within the United States—a broad 
region, a state, a metropolitan a*°2, some other type of economic area, 
a city, or some other geograph'c or semi-geographic grouping such as 
the rural or the farm population. Or, as is common, the market is 
nation-wide but the sales program is organized along geographic area 
lines. In such cases, the required population projections are generally 
subject to a wider range of error than in the case of national projec- 
tions. This is true primarily because internal migration in the United 
States is currently so much greater and so much less amenable to 
measurement than migration across our national borders. In the ten 
years between the 1940 and 1950 Censuses, net civilian interstate migra- , 
tion, representing the absolute sum of the values of net migration esti- 
mates for States, amounted to about 11 million, while the net civilian 
immigration into the United States amounted to less than 2 million‘ 
[10]. ' 
Methods of making population projections for geographic areas be- 
low the national level are less uniform and less standardized than in the 
case of national projections. This is due in part to the fact that internal 
migration makes for a wider range of error in area projections, and some 
workers in this field therefore feel that the more elaborate cohort- 
survival method is not justified. In part, it is due to the fact that the 
types of data needed are not available for the areas involved. And in 
part, the situation is due to the fact that there has been very little in 
the way of research to test empirically the relative validity of the 
several methods available. A notable exception is a project by Robert 
C. Schmitt and Albert H. Crosetti described in the November 1951 
issue of Land Economics [11]. Another exception is a project going on in 
the Census Bureau, in which projections for states for 10 and 20 years 
into the future, developed vy various methods on the basis of data for 
1930 and earlier years, are being compared with the census counts of 
1940 and 1950. 

In general, most of the methods used in making population projec- 
tions for geographic areas are of four types: (1) The cohort-survival 
method, as used for the national projections but with appropriate modi- 
fications; (2) simplifications of this type of method which involve pro- 





« These figures have not been adjusted for possible deficiencies in the reported data on immigration 
and emigration. 
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jecting natural increase and migration separately, but not by age and 
sex; (3) “ratio-to-United States” techniques, which will be discussed 
more fully below; and (4) extrapolation of the population solely on the 
basis of its own past trend. Examples of each of the four types are 
found in the bibliography of a recent article published by the authors 
in Agricultural Economics Research [12]. Another example of the fourth 
type is a recent projection of the farm population made by the Bureau 
of Agricultural Economics [13]. Additional examples of the use of the 
ratio method, particularly for projecting the population of cities and 
metropolitan areas, are mentioned in the Schmitt-Crosetti article cited 
(11). 

In 1934, projections for states to 1960, by residence groups within 
states (urban, rural-nonfarm and rural-farm), prepared by the Scripps 
Foundation, were published by the National Resources Planning 
Board [14]. Since then, no comprehensive set of population projections 
has been prepared for any particular geographic grouping of the popula- 
tion of the country by the cohort-survival method. Early last year the 
authors of this paper published population projections to 1975 for the 
total population of the nine major geographic divisions and projections 
of the age-sex composition of the population of the four major geo- 
graphic regions to 1960, using the general principle of the ratio-to- 
United States method [12]. While many projections for individual 
states, cities, metropolitan areas, and other types of areas have been 
made by private business concerns, state and local governmental agen- 
cies, and research workers in universities and other research institu- 
tions, there exists no comprehensive, systematic compilation of all such 
projections or even a bibliography of those that have been published. 
Even the bibliography of recent literature on future population, cover- 
ing the period 1943-48, prepared by Irene Taeuber for the January 
1949 issue of Population Index, appears to have only a partial listing of 
of projections for geographic areas within the United States for that 
period [15]. 

In January of this year the Census Bureau published three series of 
population projections for states for 1955 and 1960 in its Current Popu- 
lation Reports [16]. The figures are shown in Table 2. This represents 
an expansion in their program of population projections to meet de- 
mands from business, land economists, and various groups concerned 
with plans for the future that are affected by future population pros- 
pects of states. 

The method used in making these state projections is of the ratio- 
to-United States type and is relatively simple. First, the proportion of 
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TABLE 2 


PROJECTIONS OF THE POPULATION OF REGIONS, DIVISIONS, AND STATES, 
FOR 1955 AND 1960, WITH CURRENT FIGURES FOR 1950 


(Totals shown may differ from the sum cf the parts shown because of rounding. Figures for 1950 
relate to April 1, and represent 1950 Census data adjusted to include members of the armed forces 
residing in the area at the time of entry into the armed forces and to exclude all other members of the 
armed forces stationed in the area in April 1950. The projections relate to July 1 and are designed to 
cover a similar population.) 














[In thousands ] 
Reston, division, Adjusted Low series Medium series High series 
and state ——— 
1950 1955 1960 1955 1960 1955 1960 

UNITED STATES 151,116* |158,176 |161,679 |161,748 |169,371 [166,179 |180,276 
Reaions: 

Northeastern States 39,713 | 40,893 | 41,109 | 41,817 | 43,064 | 42,962 | 45,837 

North Central States 44,748 | 46,268 | 46,724 | 47,313 | 48,947 | 48,609 | 52,099 

The South 47,188 | 49,243 | 50,226 | 50,355 | 52,615 | 51,734 | 56,003 

The West 19,466 | 21,772 | 23,620 | 22,264 | 24,744 | 22,873 | 26,337 
NORTHEASTERN STATES: 

New England............ 9,364 9,611 9,649 | 9,828 | 10,108 | 10,097 | 10,759 

Middle Atlantic 30,349 | 31,282 | 31,460 | 31,989 | 32,957 | 32,865 | 35,079 
Norrs Centra StrarTss: 

East North Central 30,583 | 31,942 | 32,567 | 32,663 | 34,117 | 33,558 | 36,313 

West North Central 14,165 14,326 | 14,157 | 14,650 | 14,831 | 15,051 | 15,785 
Tue Sours: 

South Atlantic 21,100 | 22,363 | 23,102 | 22,868 | 24,201 | 23,494 | 25,760 

East South Central 11,532 11,821 | 11,870 | 12,088 | 12,435 | 12,419 | 13,236 

West South Central 14,555 15,059 | 15,253 | 15,399 | 15,979 | 15,821 | 17,008 
Tue West: 

Mountain 5,075 5,465 | 5,701 5,588 | 5,972 | 5,741 6 ,357 

Pacific 14,392 16,307 | 17,919 | 16,675 | 18,772 | 17,132 | 19,981 
New ENGLanp: 

Maine 924 947 946 968 991 995 1,055 

New Hampshire 538 552 552 564 579 580 616 

Vermont 383 384 377 393 395 404 421 

Massachusetts 4,718 4,811 4,806 | 4,920} 5,035 5,055 | 5,359 

Rhode Island 782 799 800 817 838 839 892 

Connecticut 2,019 2,118 2,167 2,166 2,270 2,225 2,417 
Mrppuz ATLANTIC: 

New York 14,909 15,420 | 15,546 | 15,768 | 16,286 | 16,200 | 17,335 

New Jersey 4,842 5,065 5,159 5,180 5,404 5,321 5,752 

Pennsylvania 10,598 10,798 | 10,755 | 11,041 | 11,266 | 11,344 | 11,992 


East Norts CENTRAL: 

















Ohio 8,003 8,345 | 8,508 | 8,534] 8,913 8,767 | 9,487 
Indiana 3 ,966 4,151 | 4,236 | 4,245 | 4,438 | 4,361 |) 4,723 
Illinois 8,738 9,032 | 9,119} 9,236; 9,553 | 9,489 | 10,168 
Michigan 6,412 6,832 | 7,089 | 6.986 | 7,427 | 7,178) 7,905 
Wisconsin 3,464 3,582 | 3,614 | 3,663 | 3,786 | 3,763 | 4,030 
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TABLE 2—(Continued) 
, — Adjusted Low series Medium series High series 
Region, division, 
x ont tate 1950 1955 | 1960 | 1955 | 1960 | 1955 | 1960 
the 
~ West Norts CENTRAL: 
Minnesota 3,008 3,080 3,078 3,149 3,224 3,236 3,432 
lows 2,644 2 ,667 2,631 2,728 2,756 2,802 2,934 
Missouri 3,989 4,037 3,998 4,129 4,188 4,242 4,457 
—= North Dakota 625 619 598 633 626 650 666 
South Dakota 656 657 641 671 671 690 715 
— Nebraska 1,335 1,331 1,296 1,361 1,358 1,398 1,445 
) Kansas 1,908 1,936 1,916 1,979 2,007 2,034 2,137 
6 SouTs ATLANTIC: 
Delaware 320 340 351 347 368 357 391 
Maryland 2 ,327 2,508 2 ,625 2,564 2,750 2 ,635 2,927 
7 District of Columbia 778 805 827 823 867 846 923 
9 Virginia 3,260 3,465 3,589 3,543 3,760 3,640 4,002 
3 West Virginia 2,032 2,082 2,086 2,130 2,185 2,188 2,326 
7 North Carolina 4,056 4,253 4,339 4,349 4,545 4,468 4,838 
South Carolina 2,117 2,193 2,214 2,243 2,319 2,304 2,468 
Georgia 3,443 3,540 3,552 3,620 3,721 3,719 3,960 
4 Florida 2,764 3,177 3,518 3,249 3,686 3,338 3,923 
9 
East Sout CEnrTRAL: 
Kentucky 2,944 2,996 2,985 3,064 3,127 3,147 3,329 
Tennessee 3,314 3,445 3,502 3,522 3,669 3,619 3,905 
5 Alabama 3,088 3,176 3,202 3,248 3,354 3 ,337 3,570 
Mississippi 2,187 2,205 2,181 2,254 2,284 2,316 2,432 
West Souts CENTRAL: 
Arkansas 1,932 1,927 1,887 1,970 1,976 2,024 2,104 
Louisiana 2,699 2,815 2,868 2,878 3,005 2,957 3,198 
Oklahoma 2,250 2 ,227 2,168 2,277 2,271 2,340 2,418 
Texas 7,674 8,090 8,330 8,273 8,726 8,500 9,288 
MovuNTAIN: 
Montana 595 611 608 625 637 642 678 
Idaho 594 627 643 641 673 659 717 
Wyoming 285 300 307 307 322 316 343 
Colorado 1,323 1,407 1,449 1,438 1,518 1,478 1.616 
New Mexico 675 744 787 760 824 781 877 
Arizona 750 849 929 869 973 $92} 1,036 
Utah 694 750 786 766 823 787 876 
Nevada 159 177 192 181 202 186 215 
Pactric: 
Washington 2,341 2,554 2,709 2,612 2,838 2,684 3,020 
Oregon 1,533 1,700 1,831 1,739 1,918 1,786 2,041 
California 10,517 12,053 | 13,380 | 12,325 | 14,017 | 12,663 | 14,919 


























* This figure differs slightly from the corresponding figure for the same date published in Current 
Population Reports, Series P-25, No. 55, which includes among the United States armed forces over- 
seas those whose preservice residence was in a United States Territory or possession. 

Source: U. 8S. Bureau of the Census, “Projections of the Population by States: 1955 and 1960,” 








Current Population Reports, Series P-25, No. 56, January 27, 1952. 
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the total United States population in each major geographic division, 
and the proportion of each division’s population in each state included 
in the division, were projected from 1950 to 1955 and 1960 on the basis 
of recent past trends in such a way as to approach constancy by the end 
of the century. Next, the projected proportions for divisions were ap- 
plied to the latest projections of the total population of the United 
States for 1955 and 1960 to obtain population projections for divisions. 
Finally, the projected proportions for states were applied to these popu- 
lation projections for divisions to derive the state population projec- 
tions. By using low, medium, and high projections of United States 
population for future dates, low, medium, and high series for states 
were obtained.® (To derive revised state projections, taking account of 
the revised national totals for future years presented in this article, it 
is simply necessary to multiply the percentage of the population in 
each state as computed from Table 2 by the revised national totals 
given in Table 1.) 

A quick summary of the population outlook for the 48 states and the 
District of Columbia is provided by Table 3 and the chart. Both the 
table and the chart present prospective percentage changes in state 
population as implied by the state population projections. 

It was not discovered until the population projections were com- 
pleted and the percentage changes were being computed for this chart 
that two different combinations of assumptions as to each division’s 
share of the national population and each state’s share of its division’s 
population had the effect of making California and Florida just about 
tie for first place in rate of increase. Under the medium assumptions 
each of these states is expected to have a population increase of about 
one-third between 1950 and 1960. Some of the reasons for past rapid 
population growth in these two states have been similar, but others 
have been different. The type of climate in both states has attracted 
not only retired persons approaching retirement age, but also indi- 
viduals and families throughout the entire age span. In geographic loca- 
tion, Florida has the advantage of being nearer to the large populous 
centers of the East, although California has the advantage of the 
glamour of the West. An important difference is the more rapid in- 
dustrial expansion in California than in Florida, caused in part by the 
fact that its distance from the East means that many of the largest 
eastern manufacturing firms have established branches there, which 
have greatly increased employment opportunities. The increase in this 





5 A more detailed description of the ratio-to-United States technique is given in our previous article 
[12]. 
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TABLE 3 


PROJECTIONS OF THE PERCENTAGE CHANGE IN THE POPULATION OF 
REGIONS, DIVISIONS, AND STATES: 1950-1955 AND 1950-1960 


(Based on date in Tabie 2; unrounded numbers were used, however, in computing per cents) 








P laa Low series Medium series High series 
Region, division, 
and state 





1950-55 1950-60 | 1950-55 1950-60 | 1950-55 1950-60 
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TABLE 3—(Continued) 











, oe Low series Medium series High series 
Region, division, 
and state 1950-55 | 1950-60 | 1950-55 | 1950-60 | 1950-55 | 1950-60 
Sours ATLANTIC: 
Delaware + 6.0 + 9.5 + 8.4 +14.7 +11.4 +22.1 
Maryland + 7.8 +12.8 +10.2 +18.2 +13.2 +25.8 
District of Columbia + 3.5 + 6.4 + 5.9 +11.5 + 8.8 +18.7 
Virginia + 6.3 +10.1 + 8.7 +15.3 +11.6 +22.8 
West Virginia + 2.5 + 2.7 + 4.8 + 7.5 + 7.7 +14.5 
North Carolina + 4.8 + 7.0 + 7.2 +12.1 +10.1 +19.3 
South Carolina + 3.6 + 4.5 + 5.9 + 9.5 + 8.8 +16.6 
Georgia + 2.8 + 3.1 + 5.1 + 8.1 + 8.0 +15.0 
Florida +14.9 +27.3 +17.5 +33.3 +20.8 +41.9 
East Souts CEentRA.: 
Kentucky + 1.8 + 1.4 + 4.1 + 6.2 + 6.9 +13.1 
Tennessee + 3.9 + 5.7 + 6.3 +10.7 + 9.2 +17.8 
Alabama + 2.8 + 3.7 + 5.2 + 8.6 + 8.0 +15.6 
Mississippi + 9.8 - 0.3 + 3.1 + 4.4 + 5.9 +11.2 
West Sours CENTRAL: 
Arkansas - 0.3 — 2.4 + 2.0 + 2.3 + 4.8 + 8.9 
Louisiana + 4.3 + 6.3 + 6.6 +11.3 + 9.6 +18.5 
Oklahoma - 1.0 — 3.6 + 1.2 + 0.9 + 4.0 + 7.4 
Texas + 5.4 + 8.6 + 7.8 +13.7 +10.8 +21.0 
MOUNTAIN: . 
Montana + 2.6 + 2.1 + 4.9 + 6.9 + 7.8 +13.8 
Idaho + 5.5 + 8.1 + 7.9 +13.3 +10.9 +20.5 
Wyoming + 5.4 + 7.9 + 7.8 +13.0 +10.7 +20.6 
Colorado + 6.3 + 9.6 + 8.8 +14.8 +11.7 +22.3 
New Mexico +10.2 +16.6 +12.7 +22.2 +15.8 +30.1 
Arizona +13.2 +23.8 +15.8 +29.7 +19.0 +38.1 
Utah + 8.1 +13.3 +10.5 +18.7 +13.5 +26.3 
Nevada +11.6 +21.2 +14.1 +27.0 +17.2 +35.1 
PAciFic: 
Washington + 9.1 +15.7 +11.6 +21.2 +14.6 +29.0 
Oregon +10.9 +19.4 +13.4 +25.1 +16.5 +33.1 
California +14.6 +27 .2 +17.2 +33.3 +20.4 +41.9 























and other types of nonagricultural employment opportunities in Cali- 
fornia is also due to the momentum of a rapidly increasing market 
already of large size. While it is not likely that Florida could become so 
autonomous as California from the dominance of northern manufactur- 
ing centers, it is quite possible that its increasing population may begin 
to have a greater effect than earlier as a stimulus to many types of non- 
agricultural employment other than those which directly service the 
tourist and resort type of trade. 

The other states with high projected growth rates are those that 
border California or are just one state removed. Those for which the 
medium projections show a growth rate of 20 per cent or more between 




























952 





POPULATION PROJECTIONS 537 


1950 and 1960 are Arizona, Nevada, Oregon, New Mexico, and Wash- 
ington. In general, factors related to national security, such as atomic 
energy developments, minerals, shipbuilding, aircraft manufacturing, 
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and naval installations have been most important in pulling population 
to these states, although climate has also undoubtedly played a part. 

Under the medium assumptions, no state is expected to have a de- 
crease in population between 1950 and 1960. The states with a pro- 
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jected increase of less than 6 per cent form a stripe down the center 
of the United States—North Dakota, South Dakota, Nebraska, Iowa, 
Kansas, Missouri, Oklahoma, Arkansas, and Mississippi—with Ver- 
mont as the only state in this category that is not contiguous. We wish 
to stress, however, that these projections are based on extrapolation of 
past trends. To date, fear of atomic warfare has not caused any large- 
scale movement of industry or population to the middle part of the 
United States, which presumably would be a safer location in case of 
enemy attack. Nor have any methods of utilizing atomic energy for 
civilian purposes been put into effect that would have important effects 
in altering our pattern of industrial location. If either of these develop- 
ments occurs between now and 1960, it could well produce population 
growth rates in the states very different from those projected by the 
Census Bureau. 

All other states fall into the range of projected population growth 
of between 6 and 20 per cent between 1950 and 1960 under the medium 
assumptions. Those with prospective increases between 14 and 20 per 
cent include Delaware, Maryland, Michigan, Virginia, Colorado, and 
Utah. The states not already named and the United States as a whole 
fall into the interval of prospective population growth of 6 to 14 per 
cent. 

In general, the prospects for change in the number of families or 
households in each state roughly parallel those for the total population. 
The assumption underlying this generalization corresponds to the crude 
device for projecting families given first among the methods suggested 
earlier. If somewhat more precise indications are needed, the other de- 
vices suggested can be employed. 

The ratio method is capable of various extensions that may be useful 
to sales forecasters. Three types that seem likely to be relevant to their 
problems are as follows: (1) extension to age-sex or other types of 
population groups for divisions or states by the methods presented in 
our previous article [12] or by simplifications of these methods; (2) ex- 
tensions to other dates by interpolation or extrapolation; (3) extensions 
to areas within states—large cities, standard metropolitan areas, eco- 
nomic subregions—by repeating at the next lower geographic order the 
processes used by the Census Bureau in developing the state projec- 
tions. 


FURTHER PROBLEMS 


In conclusion, we wish to formulate certain problems that need to be 
recognized by both population projectors and sales forecasters. The 
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first is that the validity of the methods of making population projec- 
tions is an area in which more research is crucially needed. If the con- 
sumers of population projections want to be able to appraise the valid- 
ity of projections, research and testing in this field must be undertaken 
on a much wider scale than has been the case in the past. 

A second problem that should be faced more directly than it gener- 
ally has been by both population projectors and business users of their 
projections concerns the rejection, on the part of the latter, of the range 
of projections for a given future date insisted on by the demographer, 
on the ground, we are told, that the business users must furnish only a 
single forecast of sales to management. Demographers provide a range 
because it is impossible to predict population accurately. This range 
can be used to indicate in part the flexibility needed in planning sales 
programs. It is our hope that, in spite of the current predilections of 
management against the acceptance of a range in population projec- 
tions and sales forecasts, it will in time become educated to, and “sold 
on,” its rationale and useft:!ness. That the task will be hard is evidenced 
by the fact that management has apparently not shown any great 
willingness to accept the notion of sampling error in connection with 
estimates obtained in sample surveys—a concept roughly analogous to 
the concept of a range as related to nonsurvey estimates—even though 
the former has now become firmly established in scientific statistical 
thinking. The example presented earlier should suffice to convince one 
that the error of estimation varies widely from one population projec- 
tion to another and that a careful estimate of this error may serve 
better than a mere guess or even a considered judgment. 

The third problem that needs more explicit recognition and clarifica- 
tion is that of the mutual interrelationship between population and 
economic factors. No narrowly conceived, monocausal, deterministic 
view is any longer tenable. It is entirely possible that demographers 
should explicitly use economic projections for making population fore- 
casts 10 years into the future just as seriously as we think sales fore- 
casters should use population projections. 
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HICKS’ “ELEMENTARY CASE” ECONOMIC MODEL 
FOR THE UNITED STATES, 1929-1941* 


Gene H. FisHer 
The Rand Corporation, Santa Monica, California 


I. INTRODUCTION 


ULTIPLIER analysis and the acceleration principle have for many 

years been important in macro-economic analysis. The multiplier 
theory provides a tool for analyzing the effect of a change in the level 
of aggregate investment on aggregate income. An hypothesis about the 
determination of aggregate investment is furnished by the acceleration 
principle. In its simplest form the acceleration principle states that 
investment is a linear homogeneous function of the rate of change of 
income; 7.e., J;=8(dY/dt), where I is investment, Y is income, and 
8 is a parameter, the accelerator coefficient. 

Prior to R. F. Harrod’s The Trade Cycle (1936) and P. A. Samuelson’s 
brilliant paper on the multiplier and the accelerator published in 1939," 
the multiplier theory and acceleration principle were rarely, if ever, 
both used in the same economic model. After the works of Harrod and 
Samuelson, however, it became very common to use both concepts in 
the same system. 

One of the most recent models utilizing the interaction between the 
multiplier and the accelerator is that of J. R. Hicks given in his book 
on the trade cycle.? Hicks presents what he calls an “elementary case” 
macro-economic model, which may be outlined formally as follows? 


(1) Y. = A, + cY.-1 + (Yon — Y,-2), 





* The views expressed in this paper are personal and do not necessarily reflect those of The Rand 
Corporation. The author is indebted to Harry Markowitz and J. Y. Springer of The Rand Corporation 
for helpful comments, criticisms, and suggestions. 

! Paul A. Samuelson, “Interactions Between the Multiplier Analysis and the Principle of Accelera- 
tion,” Review of Economic Statistics, 21 (May, 1939), 75-78. 

2 J. R. Hicks, A Contribution to the Theory of the Trade Cycle (Oxford: The Clarendon Press, 1950). 

3 Ibid., p. 86. For the “non-elementary” or general case Hicks defines the consumption function and 
the induced investment f-inction as follows (see Jbid., pp. 182-83): 

Ca =e1Yni1ter¥na+*** +cp¥n-ptKi 2, Or <i, 
Tn =i ( Yn—1 — Yn—2) +02(Yn-2 — Yn-s) + °° * +0p-i( Yap — Ya_p). 
And therefore, 
r=p r=p—1 
Yn =Ant- 2, Cr Yuet F tr( Yaur —Yu-r-1) +K. 


r=l r=l 


541 
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where 
Y =aggregate real output 
A =real autonomous investment 
c=the marginal propensity to consume (c <1) 
v =the induced investment coefficient (v>1) 


Equation (1) is derived in the following way. Hicks defines the con- 
sumption function in Robertsonian terms; 7.e., real consumption in the 
current period is a function of “yesterday’s” aggregate real output. 
Symbolically, 


(2) C. = cY ,1. 


Induced investment is defined as that investment which results from 
changes in aggregate real output in the recent past (the acceleration 
principle): 


(3) Bo = v(Yo-1 aad Y,-2). 


Autonomous investment is considered as exogenous to the system and 
is defined to be “public investment, investment which occurs in direct 
response to inventions, and much of the ‘long-range’ investment (as 
Mr. Harrod calls it) which is only expected to pay for itself over a long 


period.” 
A definitional equation completes the system: 


Y. = A, + C. + I, 
- A, + eY.3 + v(Ya-1 — Y,-2). 


The purpose of the present paper is to attempt to estimate the 
parameters c and v in equation (1) for the prewar period 1929-1941 in 
the United States, and also to test the hypothesis that v>1. It is well 
known that specifying v>1 usually implies “explosive” behavior in 
aggregate real output over time.® 

Hicks assumes v > 1 in his model; but along with this explosive speci- 
fication he postulates two restrictions which keep the system from 
running away: (1) a restraint on the upward expansion of real output 
in the form of a scarcity of employable resources, and (2) a technical 
limitation on the operation of the induced investment mechanism dur- 
ing the downswing.® 


(4) 





* Hicks, op. cit., p. 59. 

5 See Hicks, Ibid., Chap. VII, and pp. 184-86. For a discussion using a model very similar to Hicks’, 
see P. A. Samuelson, op. cit 

6 Hicks, op. cit., p. 95. 
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II. FORMULATION OF THE MODEL FOR PURPOSES OF 
STATISTICAL ANALYSIS 


For purposes of statistical analysis the model defined by equations 


(2), (3), and (4) may be specified more completely as follows: 
(5) Ce=b6+0Yuit us 

(6 I, = 9+ B(OYin — Yer) + 

(7) A¢ = exogenous 


Y=A+O.4+1; 
=A,+ (6+ n) + aYi1 t+ BYin — BYi2 + (us + %) 


(8) 


or, 


Y,—A,>At+ (a + B)Yiu — BY v-2 + Wty 


Y,=aggregate real output in billions of dollars for time period 
t 
C,=aggregate real consumption in billions of dollars for time 
period ¢ 
I,=aggregate real induced investment in billions of dollars for 
time period ¢ 
A,=aggregate real autonomous investment in billions of dol- 
lars for time period ¢ 
a=a parameter, the marginal propensity to consume 
8=a parameter, the investment (accelerator) coefficient 
6 and »=location parameters, and \=6+7 
u, and v;=random disturbances with zero means and variances o,? 
and oa,” respectively 
w,=a random disturbance, since it is a linear combination of 
the random disturbances u,; and »;. 


The variables Y, C, and J are endogenous; A is considered to be an 
observable exogenous variable. The disturbances are each assumed to 
be serially noncorrelated. In the case of w,, for example, this implies 
E(w,w,_,) =0 for r40. 

The problem is to estimate the parameters A, (a+), and @ in equa- 
tion (9), and to test the hypothesis 821, with the alternative being 
B<1. A 0.05 critical region is to be used as the test criterion. 

Since the right side of equation (9) contains only predetermined vari- 
ables, the classical least squares method may be expected to yield con- 
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sistent estimates of the parameters A, (a+), and 8.’ Having these 
estimates, the estimate of the “structural” parameter a may easily be 
determined. There are no serious identification problems in this simple 
model.® 


III. THE BASIC DATA 


The real output (Y,) series. The basic time series needed for a statistical 
analysis of equation (9) is a series on what Hicks calls real output for 
the economy as a whole. In the present study the United States De- 
partment of Commerce annual series on gross national product in con- 
stant (1939) dollars® was used as the estimate of aggregate real output 
for the United States, 1929-1941. 

Since the model calls for a two-year lag in Y;, estimates of real gross 
national product for the years 1927 and 1928 are necessary in order to 
get the model started. Approximate values of the Department of Com- 
merce estimates for 1927 and 1928 were obtained from a chart showing 
“gross national product in 1939 dollars” published in the 1951 national 
income supplement to the Survey of Current Business.'® 

Beginning the analysis in 1929 was dictated by the basic data, since 
the revised Department of Commerce estimates begin with that year. 
Ending the analysis in 1941 was necessary because of the economic 
theory underlying the equations in the model. Both the consumption 
function and the induced investment mechanism are based on theory 
which presupposes a reasonably free-market economy. During a period 
like World War II when rigorous economic controls are in effect, the 
assumptions of the theory are not fulfilled, and therefore the model 
could not be expected to be applicable to such a period. 

It would have been desirable to have had quarterly or even monthly 
Y, data so that experiments could have been made using time lags of 
less than one year. But for the period 1929-1939 the Department of 
Commerce gross national product data in constant dollars are available 
only on an annual basis. The inability to use lags of less than one year 





7 For a technical discussion of this point see H. B. Mann and A. Wald, “On the Statistica’ Treat- 
ment of Linear Stochastic Difference Equations,” Econometrica, 11 (July—October, 1943), 173-220; and 
T. C. Koopmans, Statistical Inference in Dynamic Economic Models (New York: John Wiley and Sons, 
Inc., 1950), Chap. II, especially Section 3. For a less technical discussion see J. Tinbergen, Econometrics 
(Philadelphia: The Blakiston Company, 1951), pp. 83-84, 200-03. 

An estimate @ of a parameter @ is said to be a consistent estimate if P(|@ —0| <¢) +1 asn—+~, where 
¢ is any arbitrarily small positive number and n is the size of the sample. 

8 The model is not completely identified since estimates of § and » cannot be determined from esti- 
mates of the parameters in equation (9). This, however, is not important for purposes of the present 
paper. 

For a technical discussion on the problem of identification see Koopmans, op. cit., pp. 69-110, 238- 
57. 

* This series is found in U. 8. Department of Commerce, A Supplement to the Survey of Curren! Busi- 
ness, National Income, 1961 Edition, p. 146. 

” Jbid., p. 15, 
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is a rather serious restriction, since many economists seem to think that 
the really significant time lags are shorter than a year." From a sta- 
tistical viewpoint, however, the utilization of monthly or quarterly 
data brings up difficult problems, for example the treatment of seasonal 
variation and the possibility of significant serial correlation in the dis- 
turbances. Difficulties like these have not been faced in most recent 
econometric studies; nor are they faced in the present paper. But as 
Klein has said “. . . these perplexing problems cannot be avoided in- 
definitely if we are to get satisfactory econometric models.”” 

The autonomous investment (A,) series. In the model defined by equation 
(9) autonomous investment (A,) is considered as an observable exog- 
enous variable. Therefore a time series representing autonomous in- 
vestment had to be derived. What is really called for here, of course, is 
a breakdown of total investment inte autonomous and induced invest- 
ment. That this cannot be done accurately must be granted immedi- 
ately, especially in view of Hicks’ extremely vague definition of autono- 
mous investment." Only a rough approximation is possible. 

The usual approach to the problem is to consider gross private do- 
mestic investment as induced investment and government expenditure 
plus net foreign investment as autonomous investment. In the present 
study certain items of gross private domestic investment were placed 
in the autonomous category. The reason for doing this is to attempt to 
make the statistical definition of autonomous investment conform more 
closely to Hicks’ theoretical definition, especially with regard to the 
so-called “long-range” component. After making these adjustments, the 
series used as the approximation to autonomous investment was de- 
rived by taking the sum of the following items: government expendi- 
ture, net foreign investment, residential building (non-farm), “other” 
nonresidential building, public utility construction, and residential 
farm construction.” 

The resulting time series was taken as the estimate of A; for the 





ul E.g., see Ta-Chung Liu and Ching-Gwan Chang, “U. 8. Consumption and Investment Propensi- 
ties: Prewar and Postwar,” American Economic Reriew, 40 (September 1950), 569. 

12 Lawrence R. Klein, Economic Fluctuations in the United States (New York: John Wiley and Sons, 
1950), p. viii. 

13 See first page of this paper. Harrod’s writings on the subject of “long-range” investment fail to pro- 
vide any significant clarification. See his Towards A Dy ic Ec ics (London: Macmillan and Com- 
pany, Ltd., 1948), p. 79. 

Hicks’ definition of autonomous investment may be criticized on grounds other than vagueness 
even to the point of questioning whether all of government expenditure should be considered as being 
autonomous. But to go into this matter thoroughly would be beyond the scope of the present paper. The 
objective here is to make a statistical analysis of Hicks’ model as it stands. 

“4 The various time series on new private construction are given in U. 8. Department of Commerce, 
op. cit., p. 198. The construction series were converted into 1939 dollars by means of the construction 
price index given in ibid., p. 146. Government expenditure and net foreign investment are given in 1939 
dollars directly (loc. cit.). 








546 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1952 


United States, 1929-41. Once again it should be emphasized that the 
derived A, series is only a rough approximation to Hicks’ theoretical 
concept of autonomous investment; but it is perhaps a fairly reasonable 
approximation, especially in view of the vagueness of Hicks’ definition. 
The Y,—A,z series. With the Y; and A, estimates available, the Y, 
— A, series was computed by subtraction. The resulting series was used 
as the statistical approximation to the quantity (Y;—A,) on the left 
side of equation (9). These data, in billions of 1939 dollars, may be sum- 
marized as follows: 








Year Ag 





1927 

1928 

1929 

1930 

1931 

1932 

1933 ‘ 
1934 67.9 
1935 73.9 
1936 83.9 
1937 87.9 
1938 84.0 
1939 91.3 
1940 100.0 
1941 115.5 
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IV. RESULTS OF THE STATISTICAL ANALYSIS 


Estimation of the parameters A, (2+), and 8 in equation (9) was 
accomplished by using the classical least squares method. The results 
of the statistical analysis gave the following estimating equation: 


(10) (Y,; — Ay). = 14.25 + 1.22Y,_, — 0.56Y i+ + wu 
(0.23) (0.28) 
with 
Coefficient of multiple correlation: R=0.88 
(Adjusted for degrees of freedom) 


Standard error of estimate: S=$5.4 billions. 
(Adjusted for degrees of freedom) 


Since (a+) = 1.22 and 8=0.56, the marginal propensity to consume 
is determined to be 2=0.66. This compares favorably with estimates 





4% The numbers in parentheses are the standard errors of the respective regression coefficients. 
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of a for the prewar period in the United States derived by other in- 
vestigators. Samuelson,’ for example, using a single consumption equa- 
tion, estimated a to be 0.56 for the period 1921-39; Haavelmo,"’ using 
systems of equations, obtained @=0.67 and @=0.71 (corresponding to 
two different models) for the period 1922-41; and R. and W. M. Stone’® 
derived 2=0.75 for 1920-35 and 2=0.70 for 1919-35. 

The test on B. The test of the hypothesis that B21 at the 0.05 level of 
significance may be performed as follows, using the ¢ distribution for 
10 degrees of freedom: 


t.os(Ss) = 1.812(0.2787) = 0.5050 
Acceptance interval = 1 — 0.5050 to 
= 0.4950 to o. 


Since § =0.56 falls within the interval 0.4950 to ©, the hypothesis that 
B21 is not rejected. 

Thus on the basis of the sample used in this study there is no reason, 

from a statistical inference standpoint, to reject Hicks’ hypothesis of 
an explosive accelerator coefficient if the probability of a Type I error 
is assumed to be 0.05. 
Computation of the disturbances (w,). Annual computed values for the 
quantity (Y:—A,) may be derived from equation (10). Comparison of 
these computed values with the actual (Y;— A.) over the period 1929- 
41 gives the w,, in billions of 1939 dollars, as follows: 








Year (¥r—A,) (Y¥:—Ade 


§ 





1929 7A. 
1930 64. 
1931 59. 
1932 
1933 
1934 
1935 
1936 


67.8 
73.5 
61.0 
58.3 
49.0 
54.3 
62.3 
66.0 
1937 74.8 
1938 74.1 
1939 ‘ 67.1 
1940 : 78.1 
1941 , 84.6 
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18 “A Statistical Analysis of the Consumption Function” in A. H. Hansen, Fiscal Policy and Busi- 
ness Cycles (New York: W. W. Norton and Company, Inc., 1941), p. 255. 

17 T. Haavelmo, “Methods of Measuring the Marginal Propensity to Consume,” Journal of the 
American Statistical Association, 42 (March, 1947), 119-21. 

18 The results of the Stones’ investigation may be found in Tinbergen, op. cit., p. 97. 
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) > w, = 0 


Root mean square of the w,;= $5.4 billions 
(Adjusted for degrees of freedom) 


The disturbances on the average are fairly large relative to the mag- 
nitude of the values in the (Y;— A;) series; but this is to be expected in 
view of the relatively poor statistical fit. The variables Yr: and Y;_, 
“explain” only 77 per cent of the variation in (Y;— A:) over the period 
1929-41.19 

To obtain a rough check on the assumption of serial noncorrelation 
in the w,, serial correlation coefficients for lags from one to six years 
were computed. The results are as follows: 


Lag (years) 
1 2 3 4 5 6 





r —0.041 -—0.021 -0.219 -—0.578 +0.206 +0.157 


Using 0.05 significance points as test criteria,?° none of these serial cor- 
relation coefficients is significant except r for lag 4. Thus the assumption 
that the w, are serially noncorrelated is not entirely fulfilled, but ap- 
parently the serial correlation is not as great as might be expected on 
the basis of a casual glance at the computed w; series. 


V. CONCLUDING REMARKS 


As previously pointed out there are several weaknesses in the analy- 
sis presented in this paper, e.g., the somewhat arbitrary separation of 
induced and autonomous investment, the use of annual gross national 
product data and hence the inability to experiment with time lags of 
less than one year, and the possibility of some degree of serial correla- 
tion in the disturbances. 

Subject to these limitations the major conclusions from the statisti- 
cal analysis may be summarized briefly as follows: 





This implies that Hicks’ model does not offer an outstanding explanation of the experience in 
the United States, 1929-41. Through experimentation with other models the present author has found 
that introducing profits as an additional variable in the induced investment function will result in a 
markedly improved statistical fit. Other investigators have obtained similar results—e.y. see Liu and 
Chang, op. cit.; and also J. Tinbergen, Statistical Testing of Business-Cycle Theories: A Method and its 
Application to Investment Activity (Geneva: League of Nations Economic Intelligence Service, 1939). In 
fact it is now rather generally accepted that defining the investment function solely in terms of the 
“acceleration principle” is a questionable procedure. There are just too many other relevant variables. 
Tinbergen is especially emphatic on this point. 

® Significance points were obtained from R. L. Anderson, “Distribution of the Serial Correlation 
Coefficient,” The Annals of Mathematical Statistics, 13 (March, 1942), 8. 
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(1) Apparently the Hicksian elementary case model does not pro- 
vide a very good explanation of the prewar experience in the 
United States, 1929-41. The variables Y,, and Y:-2 “explain” 
only 77 per cent of the variation in (Y;— A.) over the period 
1929-41. 

(2) Although at first glance the estimate 8 =0.56 appears to be sub- 
stantially less than unity, the relatively large standard error of 8 
leads to acceptance of the hypothesis that B21, using the 0.05 
significance level of the test.?! Thus on the basis of the particular 
sample used in the present study and assuming the probability 
of a Type I error to be 0.05, there is no reason to reject the 
Hicksian hypothesis of an explosive accelerator coefficient. 


From the formulation of Hicks’ model given in section I of this paper 
it is apparent that the restrictions on the operation of the linear aggre- 
gate real output equation have the effect of making the model non- 
linear. The question may be raised as to whether the linear statistical 
analysis presented in sections II-IV actually represents a statistical fit 
for Hicks’ model, especially in view of his “real ceiling” hypothesis. For 
the period under analysis in the present paper, 1929-41, there can be 
little argument—with the possible exception of 1929—that the econ- 
omy was at no time banging up against a real ceiling on aggregate real 
output. Even in 1929 it is very doubtful that the real ceiling was 
reached. In fact some investigators doubt that any boom in the United 
States has been brought to an end in the way Hicks suggests.” Appar- 
ently, then, there is considerable justification for ignoring the effect 
of the real ceiling for the period 1929-41. 





™ Regarding large standard errors, Tinbergen has written the following: “We should not forget, 
while talking so much about the statistical significance and the standard deviations of our regression co- 
efficients, that even if these standard deviations are large, the most probable value for the regression 
coefficient is the value we calculate. If no other information is aveilable we can hardly avoid taking the 
regression coefficient we find, however uncertain it may be.”—J. Tinbergen, “Econometric Business Cy- 
cle Research,” Readings in Business Cycle Theory (Philadelphia: The Blakiston Co., 1944), pp. 81-2. 

2 E.g., James 8. Duesenberry, “Hicks on the Trade Cycle,” The Quarterly Journal of Economics, 64 
(August, 1950), 468-73. Also see S. S. Alexander, “Issues of Business Cycle Theory Raised by Mr. 
Hicks,” The American Economic Review, 41 (December, 1951), 874. 
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PPLIED STATISTICS is a new journal, edited by L. H. C. Tippett, to 

be issued three times a year for the Royal Statistical Society by Oliver 
and Boyd, Ltd. “The articles to +: published will include surveys in non- 
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ple terms of the modern developments in statistical theory and method that 
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that arise in applying statistics and of practical methods of the computation 
and presentation of results. Other features will include reviews of books, let- 
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ers. The activities of the Industrial Applications Section and the Study 
Section of the Royal Statistical Society will be reported.” 

The first issue contains a Foreword by A. Bradford Hill, President of the 
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Qualification in Sample Design, by Thomas Corlett; A Statistical Approach 
to the Specification of Plastics, by Charles Wainwright; Uses and Abuses 
of Factor Analysis, by Hans J. Eysenck; The Accuracy of Systematic Sam- 
pling from Conveyor Belts, by Geoffrey H. Jowett. The Questions and An- 
swers department includes three questions, with answers by G. Salter, L. T. 
Wilkins, and G. H. Jowett. There are two pages of notes and comments, nine 
on the activities of the Industrial Applications Section of the Royal Statis- 
tical Society, and six on the activities of the Study Section. One book review 
is published; it is by L. T. Wilkins and deals with M. J. Moroney’s Facts 
from Figures, a Pelican book now being widely distributed on newsstands in 


England. 
W. A. W. 


Everyday Statistics. George W. Snedecor. Dubuque, Iowa: William C. Brown 
Company, 1950. Pp. iv, 170. $3.00. Spiral bound, paper. 


L. H. C. Tippett, British Cotton Industry Research Association 


7 author describes this book as “frankly experimental”; this reviewer 
cannot make up his mind about it. Here he records his conflicting impres- 
sions. 
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The main subjects of the book are: the ideas of sampling, probability, and 
the frequency distribution; the elementary theory and practical significance 
of the binomial, normal, and non-normal! distributions; elementary sampling 
theory, and tests of significance and confidence statements based on the chi- 
square, t- and F-distributions; sampling techniques; regression, correlation, 
and the analysis of variation; the use of life tables for making insurance cal- 
culations. Only the very elements of the subjects are introduced, as must be 
the case in a volume of only 170 pages, but the reader is shown how to make 
the necessary calculations and to use the statistical tables provided, and 
there are very many examples and exercises. Through this book a working, 
if elementary, knowledge of the subject may be attained. 

So described, this might appear to be merely another elementary text 
book, but it is more distinguished than that. Professor Snedecor has in 
mind the layman and so he introduces each bit of statistics in terms of some 
everyday problem, and explains the ideas underlying the statistical approach 
in everyday terms. And what a fascinating set of problems they are! Teachers 
and lecturers will value this feature of the book. The exposition is unhurried 
and the working of the illustrative examples carried through step by step, 
with full explanation of the meaning of each step. The author takes the 
reader on a series of excursions of rather exciting discovery. The book also 
is balanced in its parts so that anyone who can read the first few chapters 
should be able to work his way through to the end. 

All this is excellent, but the reviewer has doubts that he cannot still. 
Although “no mathematics beyond elementary algebra is required” and 
“the attempt is made to present the logic of the science with only so much 
mathematical symbolism as is necessary for clarity,” the “lay reader” has 
to learn, and learn quickly, a formidable amount of what he must regard as 
jargon. The use of “hypothesis,” “confidence intervals,” “chi-square,” 
“degrees of freedom,” and so on may be unavoidable in dealing with “every- 
day statistics,” but they are not everyday words. By the time he is able to 
use these with the facility necessary to read this book the lay reader has 
certainly lost his Eden-like innocence. On p. 17 he is assumed to be familiar 
with the word “parameter.” And on p. 15 there is reference to the fallacy 
of stating “The probability is 0.95 that # (a population parameter) lies 
within the (given) confidence interval”: that is certainly a fallacy, but not 
a layman’s fallacy. 

Under the title “Everyday Statistics” and the sub-title “Facts and Falla- 
cies” the general reader might expect to learn something about weighted 
averages, percentages, index numbers, and time charts; and such fallacies 
as mistaking seasonal or random fluctuations for trends, and correlations, 
particularly spurious correlations in time, for causal relationships. These 
are the very stuff of everyday statistics but they find little or no mention 
in this book. 

However, we may remind ourselves that the book is experimental, as its 
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format shows—although very well done, it is reproduced from typescript. 
It is a very interesting and valuable experiment in exposition; perhaps 
Professor Snedecor intends to write a more definitive edition after he has 
studied the reactions of people to this experimental one. This reviewer 
hopes so. And he suggests that the reactions of the readers for whom the 
book is intended are more important than the crabbing criticisms of a 
reviewer who is also a professional statistician. 


Statistical Design and Analy is of Experiments for Development Research. 
Donald Statler Villars. Dubuque: Wm. C. Brown Company, 1951. Pp. xvii, 455. 
$6.50. 


KENNETH J. ARNOLD, Michigan State College 


ssrpuis book has been written for the purpose of summarizing for the busy 

‘eas worker or executive the fundamental principles involved in 
the design of efficient experiments,” Villars says in his preface, adding later, 
“The book is purposely made brief so that the busy worker can read and be 
done with the essential parts and use the methods without having to go any 
deeper into the subject than he desires. A journalistic order of presentation 
is used, so that one may almost immediately start applying variance analysis 
to his own data.” 

Chapter headings give a good idea of the contents of the book: Introduc- 
tion, Use of the calculating machine, Student’s t-test, Distributions and sta- 
tistical significance, Analysis of variance, Principles of design, Designs 
available, Analysis of covariance, Discontinuous data, The control chart, 
Sequential analysis, Miscellaneous refinements, Efficiency studies, Funda- 
mental theory. Ten appendices are devoted to tables and charts, some of 
which are presented at the appropriate place in the body of the text as 
well as in the appendices. Answers are given to problems which appear at 
the ends of some chapters. A glossary of terms forms the final appendix. 

In accordance with the journalistic approach the reader is first introduced 
to methods, which are supported by a minimum of rationale. References 
to later sections in which more background material may be found are 
frequent. There are also many references to papers and to other books in 
which the reader may find further development of the subject under discus- 
sion or a more complete discussion of the reasons for using certain procedures. 
In some sections of the book such references are essential. The author has 
sometimes been so brief as to be unintelligible. The résumé (pages 281, 282) 
of the early part of Wallis’ paper on “Compounding Probabilities from 
Independent Significance Tests,” Econometrica, 10 (1942), pp. 229-248, is at 
least misleading. The implications of several statements (pages 255-257) 
about sequential analysis are incorrect. The reviewer must also admit that 
he was unable to verify the variance analysis which appears on page 139, 
a task to which the author invites the reader. It is discouraging to note that 
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the rules for rounding given on pages 18 and 19 have not been followed 
in some of the derived figures given along with the original data on which 
the analysis is said to be based. 

In most books about statistics, Type II errors are given little, if any, 
consideration. Dr. Villars does offer a short discussion in the course of which 
he states, (page 55), that “Following Fisher, we avoid making a Type II 
error by abstaining from accepting null hypotheses uncontradicted by the 
data.” It would appear that, in most applications of statistics, most present 
day statisticians believe they hold this point of view. However, without 
Type II errors the author has a difficult time in his chapter entitled “Eff- 
ciency Studies.” In the early part of the chapter, he defines “efficiency of 
experimentation as the quotient of the minimum amount of information 
detectable as significant divided by the cost.” Thus efficiency depends, as 
the author points out, on a standard of treatment variance which, for a 
fixed type of design, becomes more strict as the degree of replication is 
increased. Later in the chapter, in comparing designs, the author changes 
to a more familiar definition of relative efficiency which depends on the 
components of variance found in a previous experiment. 

Dr. Villars, with his background as a research chemist, has produced a 
book which will be appreciated by men in research and development both 
by its choice of topics and its method of presentation. His discussions of 
models for the analysis of variance and of getting from the model to the 
appropriate analysis are stimulating features of his presentation. It is to be 
hoped that a few sections of the work can be clarified in a second edition. 


Statistische Methoden fiir Naturwissenschafter, Mediziner, und Ingenieure 
(Statistical Methods for Natural Scientists, Medical Men, and Engineers). A. 
Linder. Basel: Verlag Birkhauser, 1951. Pp. 238. 


Two reviews follow: 


Joun W. TuxKey, Princeton University 


HIs book is without direct parallel in English. It is written for readers 
who are mature in approach, yet willing to begin with examples and 
algebraically formulated recipes in view of the mathematical theory which 
follows in the last third of the book. It appears in a series of “textbooks 
and monographs in exact science,” and in format and tone gives tiat 
appearance of clarity, precision, and approachableness which we would 
expect in the best German texts in such a series. It seems to have as good 
a chance of selling an exact scientist on statistics (if he reads German easily 
and is willing to take certain things on faith) as any book I know. 
Scope and coverage run as follows: 7 pages of introduction, 60 pages of 
descriptive statistics, going as far as multiple regression with up to 4 vari- 
ables, discriminant functions and Mahalanobis’ generalized distance, 43 
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pages on statistical tests going as far as the testing of partial correlations, 
discriminant functions and generalized distances, 18 pages on the analysis 
of variance going beyond double classification, 75 pages of distribution 
theory in which the standard distributions of normal theory (t, x?, F, correla- 
tion coefficient) are derived as are their detailed applications to almost all 
of the test procedures given previously, 3 pages of references, and 12 pages 
of tables. 

For one thin book (the paper is thin and high gloss) this is tremendous 
coverage, yet the basic material is there, with no liberties in treatment. 
Clearly much time has gone into rethinking and rewriting. If to Fisher’s 
Statistical Methods for Research Workers were added material on discrimina- 
tion, generalized distance, and the derivations of the standard distributions 
and their applications to the standard tests, and then if the book were re- 
written for mathematically trained scientists rather than English biologists 
of the 1920’s, one would have the skeleton of Linder’s book. 

All books contain points with which the respective reviewer disagrees. 
Here they are so few that all that were found can be mentioned. At page 16 
the classical virtues of mean, variance, regression and correlation coefficients, 
etc., are stated without the necessary and important limitation to an under- 
lying normal distribution. At page 44 we are told that partial regression 
coefficients correspond to holding the auxiliary variable fixed, instead of a 
more precise statement (that we have allowed for the linear effect of the 
auxiliary variable, or that we are holding the sample distribution of values 
of the auxilary variable fixed, for example). At the top of page 89 and again 
at page 139, ¢ tests among arbitrary pairs among several means are made 
without warning about the effect on the significance level. At page 117 
we test the significance of a discriminant function without regard to the 
fact that the three variables involved have been selected from 5 (or maybe 
9). At page 140 we are told that normality and constant variance, rather 
than additivity, are the goals of transformations in the analysis of variance. 
Although the figures are almost uniformly excellent, the labelling of the 
two curves on figure 26, page 102, is inadequate, since it is very difficult to 
tell whether they cross 0, 1, or 2 times. 

Only in the case of the multiple ¢-testing situation can the reviewer point 
to books in use which avoid the misinterpretations mentioned above. 
This makes the record of Linder’s book rather high. 

As positive distinctions, we must emphasize Linder’s use of r?, the coeffi- 
cient of determination, as the fundamental notion, with its own symbol, 
instead of the coefficient of correlation, and the extension of scope beyond 
the usual bounds to cover discrimination and generalized distance. The 
reader who absorbs the contents of this book will have a working grasp 
of the classical methods of analyzing data, and an understanding of the 
derivation of the corresponding sampling distributions on the null hypothe- 
ses. This reviewer’s acquaintance with the statistical literature in German 
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is limited, but he suspects that no competitive book is at all comparable, 
In this case, Linder’s book should do much to make the classical methods 
used and understood among German-speaking scientists and engineers, 


J. Wotrowitz, Cornell University 


HE purpose of the author is to make available to German-speaking readers 
the advances in statistics which, he says, were made in recent years in 
Great Britain and the United States. It turns out that his book is essentially 
an exegesis of the methods in Fisher’s Statistical Methods for Research Workers 
and in some related papers. The usual tests are described, and mathematical 
proofs are given. The idea of the power of a test is never mentioned. There 
is no discussion of interval estimation. Needless to say, there is also no 
mention of sequential analysis or decision functions. 
This would have been a good text book fifteen years ago. It must now be 
considered out of date, along with several books in English of this kind. 


Basic Methods of Marketing Research. James H. Lorie (Associate Professor of 
Marketing, University of Chicago) and Harry V. Roberts (Assistant Professor of 
Statistics, University of Chicago). New York: McGraw-Hill Book Company, 
Inc., 1951. Pp. xii, 453. $6.00. 
Paut W. Stewart anv A. B. BLANKENSHIP, 
Stewart, Dougall & Associates, Inc. 


HIs is a sound book. It appears to be the best of the postwar texts on 

marketing research. It is comprehensive. Its five sections include an 
Introduction, Scientific Method in Marketing Research, Sampling, Com- 
munication and Observation, and Administration. Of these, major emphasis 
is given to Sampling and to Communication and Observation, each of 
which accounts for one-fourth of the text. 

These are also the strongest sections of the text. The section on Sampling 
is by all odds the best non-technical discussion of this subject the reviewers 
have seen. It is written for easy understanding, and omits the usual formulas 
—so difficult for the beginning student to grasp. In place of these formulas 
are step-by-step descriptions of the various considerations in sampling. The 
exposition is lucid, and easy for persons of limited mathematical background 
to follow. Statistical mumbo-jumbo is avoided. One appealing point is the 
authors’ avoidance of subscribing to any one sampling method as best in 
al] situations. The authors realize that every kind of sampling plan has its 
place. 

The section on Communication and Observation might have been given 
a better title. This refers to the “how” of getting the field information, the 
methods of obtaining information, whether by questioning or observation. 
The discussion naturally concerns questionnaire formulation as its primary 
subject. In this field, too, the text is strong. The authors masterfully handle 
the complex problems of questionnaire construction. 
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No book has ever been perfect; the present one is no exception. The 
section on scientific method is good, but does not always hang together. 
There are frequent redundancies and clumsy sentence structures. Most 
paragraphs—throughout the entire text—are quite long, making this a 
fatiguing book. But these criticisms are directed against the editors, not the 
authors. A better editing job would have corrected these minor deficiencies. 

Perhaps the most serious criticism is from the standpoint of a research 
practitioner. There is not enough emphasis on the practical problems of 
conducting marketing research. For instance, the section on scientific 
method is scarcely sufficient for the student to understand the practical 
difficulties of problem definition. Nor, on the other hand, is there sufficient 
discussion of the nature of the report to management. And the minor but 
time-consuming tasks of tabulation, machine-running, and statistical 
treatment of results are all minimized in their treatment. 

None of these minor drawbacks places the book second to any in its 
field. If the reviewers were teaching a course in marketing research, this 
would be their basic text. 


Committee Decisions with Complementary Valuation. Duncan Black and R. A. 
Newing. London: William Hodge and Company, Limited, 1952. Pp. vii, 59. 10s/ 
6d. : 


Harry Markowitz, The Rand Corporation 


_ book is concerned with a “committee procedure .. . encountered in 
practice.” Specifically, “...every motion enters the voting process 
once, and continues to be put against other motions until it is either defeated 
by another motion or finally emerges as the decision of the committee.”! The 
principal questions which the book answers are: Suppose that there are three 
individuals on the committee, under what conditions (concerning the rank- 
ings of the various proposals by the members of the committee) does there 
exist a “motion” which has a (pairwise) majority over every other a!terna- 
tive? How can we characterize such a proposal? Clearly such a pr“ posal if 
it existed and if it were considered would be chosen by the committee 
procedure described above. 

The above questions are answered for the following cases: 1) There is a 
finite number of proposals or motions; 2) there is an infinite number of 
proposals and the preferences of each individual can be described by a 
continuous unimodal (utility) function of one (decision) variable; 3) there 
is an infinite number of alternatives and the preferences of each individual 
can be described by a continuous unimodal function of two variables (e.g., 
two aspects of the alternatives). Since the preferences among various possi- 
bilities concerning one aspect of the decision to be made may depend on 
the anticipated decision concerning the other aspect, this case is referred to 
as that with “complementary valuations.” The writers do not make clear 





1 Black and Newing, op. cit., p. 1. 
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how their pairwise voting procedure applies when there is a continuous 
(or even a countably infinite) number of alternatives. The idea of a proposal 
which is (pairwise) superior to every other proposal does not lead, however, 
to any conceptual difficulties even in the infinite case. 

The authors also deal with questions as to the stability of the committee 
procedures in question. Under what conditions are there proposals which 
are equilibrium (or stable equilibrium) positions in the sense that once the 
committee has arrived at (or sufficiently near to) such a motion it will 
remain at (or converge towards) this proposal? 

Within the scope of the book—and this is a scope which the authors 
clearly delineate at an early stage—this work deals with the subject in a 
clear and scholarly fashion. The authors give a thorough account of the 
matters outlined above. There are, however, a number of questions which the 
authors may wish to consider in some future more definitive account. 

(1) What is the purpose of the results? Are these results to be used as a 
basis for finding “best” proposals in actual situations, or are there certain 
qualitative aspects of the results which are of interest in themselves? 

(2) The subject matter of this book comes under the general heading of 
the so-called “social welfare function.”? A social welfare function is a rule 
which gives a “community ranking” of various alternatives as a function of 
the various individuals’ rankings of these alternatives. Any voting system, 
for example, is in effect based on some social welfare function. What is the 
relation between this social welfare function and others? For example: 
(a) It can be shown that the pairwise voting system does not always give 
the same answer as voting on all proposals at the same time. (b) We know 
that this welfare function, like all welfare functions, somehow or other con- 
tradicts the Arrow “reasonability” conditions. In what way does it contra- 
dict these conditions and does this appear to be an argument against the 
desirability of the particular welfare function or an argument against the 
Arrow reasonability conditions? 

(3) Eventually we want to know how the interesting qualitative results 
or computational procedures apply when there are more than three indi- 
viduals on the committee. 


The Facts of Life from Birth to Death. Louis J. Dublin (Second Vice-President 
and Statistician, Metropolitan Life Insurance Company) in collaboration with 
Mortimer Spiegelman (Assistant Statistician, Metropolitan Life Insurance Com- 
pany). New York: The Macmillan Company, 1951. Pp. x, 461. $4.95. 


Tuomas N. E. Grevitxe, National Office of Vital Statistics 


HIS book presents, in question and answer form, a surprisingly wide 
p pre of facts and figures concerning “man’s health and welfare.” A 
good idea of its scope is given by the chapter headings: Who We Are—The 





2 See, for example, Kenneth J. Arrow, Social Choice and Individual Values, Cowles Commission 
Monograph 12, John Wiley & Sons, 1951. 
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Population, The Pattern of Reproduction, The Pattern of Marriage, The 
Average American Family, Marital Dissolution by Divorce, Separation, 
and Death, The Sick and Their Care, Mortality—in General, The Conquest 
of Tuberculosis, The Mystery of Cancer, The Control of Diabetes, A Dimin- 
ishing Burden—Pneumonia and Influenza, An Increasing Burden—The 
Circulatory Diseases, The Problem of the Rheumatic Diseases, The Hazards 
of Infancy and Childhood, Our Old People, The Accident Toll, Suicide and 
Homicide, The Labor Force and the Hazards of Oceupation, Human Im- 
pairments, Mental Health, Other Diseases of Public Interest, Our Body 
Build, The Public Health and Its Administration, How Long We Live, 
The Effects of War. Altogether 1,100 questions are listed, but as 13 are 
merely cross-references to questions answered in other chapters, there are 
actually 1,087 answers. The smallest number of questions in any chapter is 
36 on pneumonia and influenza; the largest is 56 on accidents. The length 
of the answers varies from a single short sentence to almost two pages, the 
average being between one-third and one-half a page. It will probably not 
occasion surprise that the longest answer is to the question ‘‘What is a life 
table?” 

Illustrative of the types of questions are the following, taken from the 
jacket: How many “war brides” were there? Are men safer drivers than 
women? What is the cost of being born? Does mental disease run in families? 
Are the more educated people less likely to marry? At what age is a single 
man considered a bachelor? The author has carried on, over a period of 
forty years, extensive correspondence on all the general subjects dealt with, 
and many of the questions and answers are taken directly from his files, 
but these have been supplemented by additional ones to give a well rounded 
presentation of each subject. The style is lucid and vigorous, and the matters 
discussed are so packed with human interest that, having once picked up 
the book, it is extremely difficult to put it down, one’s attention being led 
from one topic to another. While the presentation is popular, and highly 
technical terms are either explained or avoided, the information given is 
timely and accurate, no space being wasted on vague generalities and 
platitudes. While this is primarily a book of facts, the author has not hesi- 
tated to cite well supported expert opinion when actual facts were not 
available. Nor has any attempt been made to avoid touching on subjects 
which may have controversial implications, such as, for example, the political 
and social implications of the aging of our population, and the number and 
distribution of physicians in the United States. While much space is devoted 
to presenting simple and easily understood facts and to clearing up popular 
misconceptions, such relatively difficult concepts as the net reproduction 
rate and the true rate of natural increase are explained. While many figures 
are given, the treatment is not primarily statistical, and tables are few. 

Altogether, this book contains a mine of information of great value to 
the demographer, statistician, sociologist, and public health worker. It is 
evident that a staggering amount of research was necessary to bring this 
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material together. In the two months the reviewer has had a copy, he has 
found it invaluable in answering inquiries that have come to his desk. While 
it obviously could not, in a little over 400 pages, deal exhaustively with all 
the subjects treated, a carefully selected bibliography on each chapter 
directs the reader to sources of further information. A reasonably complete 
index helps to locate the information desired. 

In a book covering so wide a range of subject-matter, it is always possible 
to find something to criticize. In the opinion of the reviewer, the reference 
on page 249 to “mechanical suffocation” as the principal cause of death 
of children under five could properly have been discussed more fully, with 
perhaps a stronger indication of the doubt raised by some recent investiga- 
tions as to the correctness of this diagnosis in many of the cases reported. 
The reviewer believes that in future editions of this book considerable effort 
could profitably be expended on broadening the coverage of the index. This 
is fairly complete, and would undoubtedly suffice for the ordinary kind of 
book. However, when information on so wide a variety of topics is presented 
in the form of separate questions and answers, the classification by chapters 
is of necessity somewhat arbitrary, and the reader seeking specific informa- 
tion is at the mercy of the index more than with the usual type of book. The 
reviewer found he often had to think of several synonyms of the term that 
first occurred to him before finding one listed in the index. Some terms not 
appearing there, but on which information is given in the book are: de- 
formity, obesity, orthopedic defects, mechanical suffocation, and meno- 
pause. This suggestion is made with hesitancy, as the reviewer is well aware 
from his own experience that the preparation of an index can be extremely 
time-consuming, and that omissions will always be found, however great 
the effort to secure complete coverage. 


Migration Within Ohio, 1935-40. A Study in the Re-Distribution of Population. 
Warren S. Thompson. Oxford, Ohio: Scripps Foundation for Research in Popula- 
tion Problems, Miami University, 1951. Pp. ix, 227. $1.00. Paper. 
Howarp Wuippte Green, Director, Cleveland Real Property Inventory, 
and Secretary, Cleveland Health Council 


HIs study presents data on migration between subregions within the 
State of Ohio. It is based upon special tabulations prepared from the 
1940 Census schedules. The question ‘In what place did this person live 
on April 1, 1935?” included in the 1940 schedule, made such a study possible. 
The stream of migration within and between various subregions in Ohio 
and the characteristics of the persons who made these moves are presented 
in great detail. 
The subregions recognized in most tabulations are as follows: 
(a) Metropolitan subregions each divided into central city and remainder of 


county (in the case of Cleveland and Youngstown, each of two counties). 
The remainder of the county, that is, the area outside each central city is 
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called its “ring.” The types of communities of which each “ring” is com- 
posed is other urban, rural-nonfarm, and rural-farm. 

(b) Nonmetropolitan subregions each consisting of groups of counties. Each 
subregion is broken down into urban, rural-nonfarm, and rural-farm. 


The 88 counties of Ohio are included in 8 metropolitan subregions each 
with its “ring” and in 6 nonmetropolitan subregions. 

After analyzing movements of persons between the various subregions, 
the migrants are analyzed by sex, age, marital status, education, employ- 
ment status and occupation, and wage and salary income. 

It is imperative that the reader study the introduction with great care. 
since it is the key to the entire study. 

The author warns the reader that the patterns of behavior of migrants 
within Ohio must be compared with those within other comparable states 
before it can be assumed that they are generally applicable. 

It must also be kept in mind that only part of the total migration is 
measured since between 1935 and 1940 many persons moved from here to 
there and to some other place, while the data analyzed refer to the net of 
these many movements which are constantly takkng place in large metro- 
politan communities and probably in rural sections as well. 

The appendix shows the number of males and females living on April 1, 
1935, in each subregion and in each type of community, central city, other 
urban, rural-nonfarm, and rural-farm who moved to each of the other sub- 
regions by April 1, 1940. This table allows the student interested in but one 
Ohio community to study in detail some of the in-migration and out- 
migration characteristics of its migrants. 

It is regrettable that similar tables showing age, education, and so forth 
were not included in the appendix, even though many pages would have 
been required for their presentation. 

The study is profusely illustrated with small tables, charts, and maps. 


The Cost of Sickness and the Price of Health. C. E. A. Winslow. Geneva: World 
Health Organization, 1951. Pp. 106. $1.50. Paper. 


Eur GinzBerG, Columbia University 


HIS monograph, written by the Professor Emeritus of Public Health at 
T Yate University for the World Health Organization, is distinguished for 
several reasons. It contains a large number of concise evaluations of empirical 
investigations which have been carried out in various parts of the world on 
the interrelations of poverty and disease. The author presents his materials 
in a challenging fashion, he illuminates them with incisive insights, and he 
adds to the value of his review by appending a large number of important 
bibliographical references. The monograph was prepared specifically for the 
Fifth World Health Assembly as a basis for discussion on the “Economic 
Value of Preventive Medicine,” the first subject on the agenda. 

The press release of the World Health Organization which announced 
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the publication of the monograph stated that the keynote of the study is 
that “prevention is not only better than cure, it is also cheaper.” It further 
stated that this has long been recognized by public health workers, yet 
ignored by some economists. It concludes with the comment that the study 
is addressed not only to the medical man, but also to the economist, the 
politician, the sociologist, and all others interested in the problems of modern 
society. This evaluation of the monograph, written from the point of view 
of a social scientist with a broad interest in medical economics, will concen- 
trate on the larger issues which the author presents rather than on statistical 
methods and analysis. 

Winslow takes as his point of departure the desirability of each country’s 
analyzing the most vital health problems which face it and determining 
those that can be attacked with maximum results at minimum cost. His 
second assumption is the desirability of developing cooperative programs 
of technical assistance in which more fortunate areas cooperate with those 
less advanced for the common goal of a healthful, prosperous and peaceful 
world. This reviewer was pleased tc see some reference to a cost-benefit 
approach at the very outset of the monograph since he has long been dis- 
turbed by medical uplift literature that deals with social desiderata without 
reference to cost. 

In his first chapter, “The Cost of Sickness,” Winslow discusses the eco- 
nomic burden of premature death. Winslow concludes that a life of fifteen 
years or less represents a net economic loss to society because of the costs 
of nurturing a child; a life of forty represents a net economic gain; and a 
life of sixty-five, a.net gain more than twice as great. He then presents 
statistics which correlate annual income per capita, food supply (calories 
per day), and physicians per hundred thousand population with life ex- 
pectancy at birth. Individuals in developed areas have a life expectancy of 
sixty-three years; in intermediate areas, fifty-two years, and in under- 
developed areas, thirty years. It is difficult to understand why Winslow failed 
to note that life expectancy figures are influenced tremendously by rates of 
infant and child mortality. It is crucial from every point of view whether 
life expectancy in developed areas is twice as great as in underdeveloped 
areas because adults die on the average at thirty in underdeveloped areas or 
because of high infant mortality in underdeveloped areas, which of course 
would depress the average. 

In the same chapter Winslow presents without critical comment Ewing’s 
estimate that the annual cost of illness in the United States reaches the 
staggering total of $38 billion, including $11 billion as the cost of premature 
death, $5 billion as the cost of short-term illness, and $22 billion as the cost 
of partial or total disability. This presentation is followed by a summary 
of the high cost of malaria, tuberculosis, and venereal disease in various 
parts of the world. Toward the end of the chapter Winslow states: “It is 
certainly clear that the cost of preventable diseases imposes a staggering 
burden upon the human race.” The statement can be accepted on faith but 
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not by virtue of the potpourri of disability rates, mortality statistics, and 
economic estimates of the value of human life that precede it. 

In Chapter 2, “Methods of Reducing the Burden of Disease, and the 
Economic Results Attained,” Winslow presents an array of illustrations to 
show the achievements in environmental sanitation, particularly the con- 
trol of cholera, typhoid fever, various forms of dysentery, and, above 
all else, malaria. It is an impressive story of how relatively small expenditures 
can increase the effective years of life, reduce morbidity, and contribute to 
raising the level of worker productivity. Winslow stresses the fact, however, 
that no such easy victories can be achieved over contact-borne diseases, 
such as tuberculosis and syphilis, although here, too, important gains have 
been registered. Although he believes that the use of the BCG Vaccine for 
the control of tuberculosis and penicillin campaigns for the control of 
venereal diseases can make a contribution, they must be integrated in 
perrianent health programs to be fully effective. In discussing the outlines 
of a program for promoting positive health, Winslow stresses what improved 
sanitation and control of the spread of infection from person to person can 
accomplish in the field of maternal and child health, and the significant 
role of nutrition in the establishment and maintenance of sound health. 

One of the least satisfactory sections of the book, from the viewpoint of 
this reviewer, deals with the “promotion of mental health.” It is not enough 
to call attention to the fact that the costs of mental illness are very great 
both to the individual and to society, particularly if one adds to the costs 
of hospitalizing psychotic patients the very large losses in effective per- 
formance which flow from neurotic disturbances in the population at large. 
This reviewer does not believe that any contribution is made to public 
health or the mental hygiene movement by such statements as the following: 
“After all, the ideal objective for public-health workers is to change the 
habits and attitudes of mankind; and this is a problem in mental hygiene 
and in health educatioa.” This sounds more like Moscow than New Haven! 

In Chapter 3, “Planning a National Health Programme and Its Cost,” 
Winslow argues that “the greatest good of the greatest number should be the 
ultimate criterion of the soundness of a health programme... Such a (pro- 
gramme) should recognize limitations of money, of physical facilities, and 
of health personnel.” Winslow begins with a discussion of the almost uni- 
versal shortage of personnel, and quite correctly stresses the importance of 
utilizing sub-professional personnel wherever possible to alleviate serious 
manpower stringencies. In discussing physical facilities, the author accepts 
with only a minor reservation the astronomical shortages in hospital facilities 
in the United States which have been determined by medical planners. He 
does note, however, that home care or ambulant clinic care can be substi- 
tuted for intramural care with financial economy and benefit to the emo- 
tional health of the patient. It is not only tantalizing but disturbing to have 
the author suggest in one sentence that the United States needs an additional 
800,000 beds, and in the next indicate that perhaps the approach of building 
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additional intramural facilities would be costly and unnecessary. It may be 
unfair to ask any expert to resolve all the differences of opinion that exist 
concerning the best ways of bringing the advances in preventive and cura- 
tive medicine to the public. But it is not unfair to suggest that a positive 
resolution of certain basic issues, such as the need for hospital facilities, 
should have been presented. 

In discussing costs, Winslow concludes that a reasonably complete pre- 
ventive health service can be provided in the United States for about $4.00 
per year per capita, and that medical care insurance costs about $40.00 
per v*  ..: capita. This reviewer has no basis for questioning the first 
figu:., but he has every reason to challenge the second. Even for selected 
populations, hardly a sound basis for social planning, the $40.00 figure for 
medical care insurance is much too low. 

Winslow stresses the fact that in countries where financial resources are 
small and health problems serious, there is great danger that the preventive 
aspects of medicine which would prove most fruitful in the long run will be 
neglected. Actually this danger exists in almost every country, rich or poor. 
A single example would be the worsening plight of the preventive services 
in Great Britain since the advent of the National Health Service. Winslow 
quotes Biggs, the great New York Public Health Administrator, whose 
slogan was “Public Health is purchasable. Within natural limitations, a 
community can determine its own death rate.” He pleads for a practical 
system of priorities. Yet the materials which he presents in his concluding 
chapters underline the difficulties of developing such priorities. 

In Chapter 4, “Interrelationships of Poverty and Disease,” Winslow indi- 
cates that poverty is associated with an excessive burden of preventable 
disease. Without challenging this conclusion, it might be interesting to 
speculate on certain important correlations that have begun to emerge 
between income and life expectancy. The day may be near when a new type 
of balance will be established on a two-generation basis. One man may work 
under such stress that he accumulates enough money to give his children 
an excellent start in life and consequently long life expectancy, but the price 
he pays is his own premature death. Another man who does not work so 
hard may live longer but some of his children may die or be handicapped 
because he is unable to provide certain advantages for them in childhood. 
This is a further adaptation of a nonstatistical observation of an American 
social scientist who noted that in old graveyards in New England, John 
Smith was buried with his three wives beside him, while in the large modern 
cemeteries near urban centers, Jane Smith is buried with her three husbands 
beside her. 

Winslow discusses the fear of overpopulation held by some investigators 
and concludes that it offers no valid ground for relieving the public health 
worker of his responsibilities to control preventable disease. He sees no 
ground for the defeatist attitude of the “man-under-nature” ccncept of 
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human destiny. Nor does he believe in the “man-over-nature” philosophy, 
for he fears that its proponents will “bruise themselves on the jagged sur- 
faces of reality. The position of the true scientist should lie between these 
two poles. A ‘man-with-nature’ approach, recognizing both the facts of 
life and the human aspirations, which are equally « part of nature, is a crea- 
tive force... . ” Winslow goes on to emphasize that health aims cannot be 
attained in many areas without control of soil erosion, irrigation of desert 
areas, improvement in plant breeding and the use of fertilizers, control of 
animal and plant diseases, as well as the development of local industries, 
increased power resources, production of farm machinery, and competent 
government services. 

In his last chapter, “Programme of Technical Assistance,” he stresses the 
need for the more developed countries to aid the underdeveloped countries, 
and the need for great care to insure the appropriate levels of coordination. 
He quotes approvingly the principles of health programs recently formulated 
by the World Health Organization, which stress the interrelationships 
between health and the economic and social conditions in the country: 
“Public-health officers have for long affirmed that economic development 
and public health are inseparable and complementary and- that the social, 
cultural and economic development of a community, and its state of health, 
are interdependent.” 

The limitations of this monograph are a reflection of the very difficult 
assignment of the author. It is simply not possible to deal comprehensively 
with the cost of sickness and the price of health on a world-wide basis 
within the compass of a hundred pages or so. But there is much more to 
commend than to question or criticize in this monograph. It is an exciting 
little book that should be read by every social scientist with an interest in 
the realities that help to determine human welfare. Winslow states that 
“The promotion of the health of the peoples of the world is basically a 
moral—not an economic—issue.” This reviewer would like to modify this 
statement to read as follows: The promotion of the welfare of the peoples 
of the world is basically a moral issue. It depends primarily on the wise 
use of our economic resources and on the maximum application of the 
potentialities of preventive and curative medicine. 


Studies in Income and Wealth, Volume 14 (Conference on Research in Income 
and Wealth). New York: National Bureau of Economic Research, 1951. Pp. x, 
276. $3.50. 

J. A. Norpin, Iowa State College 


HIS volume’s papers and comments made up the National Bureau’s 
April 1950 Conference on Research in Income and Wealth. The common 
reference point is a concern with incorporating analysis of assets and debts 
into economic theory. The present work supplements the materials presented 
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in volume 12 of the same series of studies (papers and comments at a con- 
ference in January 1948.) 

The papers can be divided into two groups: those that attempt to measure 
wealth, and those that analyze economic relations in which wealth is rele- 
vant. In the first group are papers by Raymond W. Goldsmith, Horst 
Mendershausen, Dwight B. Yntema, and Allen D. Manvel. In the second 
group are papers by Daniel H. Brill, Lawrence FR. Klein, and K. E. Boulding. 

All the papers in the first group suffer from a common weakness; none 
of them contains a statement about the specific objectives to be pursued. 
Presumably a sensible research project begins with the formulation of a 
problem dealing directly with human behavior. “How wealthy are we” does 
not qualify as a research question, on this basis. “How does the volume of 
wealth affect current savings?” does qualify. The significance of this dis- 
tinction appears at many points. For instance, Goldsmith and Simon 
Kuznets (who comments on Goldsmith’s paper on a perpetual inventory of 
wealth) disagree on the importance of getting annual estimates of wealth. 
But it seems impossible to reach a useful conclusion on this issue without 
deciding what we want to use the estimates for. Mendershausen and Gold- 
smith mention several shortcomings of the method by which they estimate 
the wealth of the rich from federal estate tax returns; but the seriousness 
of a shortcoming depends on the purpose for which the estimates are to be 
used. If estimates for the rich are to be used on conjunction with estimates 
for the poor, the difference in methods used for the two groups may make all 
the estimates virtually useless. 

In the research-problems group of papers, Brill deals with the following 
problem: if all new lending ceased, no loans were renewed on becoming due, 
and all interest and amortization payments were made, what would be the 
magnitude of the resulting change in income? He presents evidence that 
postwar changes have reduced the potential depressive power of the «u»bt 
structure. A comment by A. G. Hart emphasizes a difficulty menti™ 2d 
briefly by Brill: until we have considerably more detail about the assets 
and debts of relatively small groups of families, we cannot make reliable 
predictions about the extent to which exogenous changes will affect con- 
sumption spending. 

Klein uses the Surveys of Consumer Finances (conducted for the Federal 
Reserve Board by the Michigan Survey Research Center) in attempting to 
explain the ratio of current savings to current income. The series used as 
bases of prediction are the following: liquid assets, percentage change of 
income from the last preceding income period to the current income period, 
and age of the head of the household. Separate computations are made for 
home-owners and for renters; also separate computations are made for 
families whose incomes increased and for families whose incomes decreased. 
The study is a cross-section study; prices are constant during the investiga- 
tion. Klein’s principal conclusion seems to be that liquid assets have an 
inverse effect on savings. He notes that since prices have not changed, his 
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conclusions do not permit him to decide on the correctness of Pigou’s con- 
tention that a rise in real liquid balances decreases real savings. 

Klein’s careful treatment of the statistical difficulties encountered is 

enlightening. However, there is some question about his choice of liquid 
assets rather than all assets owned by households. In time series studies 
we may not be able to use accumulated stocks, since variation in them may 
be insignificant over the time period covered. But this difficulty does not 
appear in cross-section studies such as Klein’s. Presumably consumers’ 
holdings of nonliquid assets vary considerably, and there can be no serious 
doubt that current decisions are influenced by these holdings. 
' Probably carrying liquid and nonliquid assets as separate variables would 
be helpful. For a given income and a given (small) amount of liquid assets, 
a large amount of nonliquid asssets may indicate that the income receiver 
is familiar with investment channels, and can be expected to save at a high 
rate; on the other hand, if the degree of familiarity with investment chan- 
nels does not vary significantly, a large amount of nonliquid assets may 
indicate that the income receiver has nearly completed his investment 
program, and can be expected to save at a low rate. Klein makes a crude 
attempt to show the separate effect of nonliquid assets by dividing families 
into a home-owning group and a non-home-owning group. But this procedure 
is not very helpful, since nothing is said about the sizes of mortgages. If 
liquid and nonliquid assets cannot be carried separately, lumping them 
together appears preferable to using only liquid assets. 

Boulding outlines a macroeconomic theory of distribution using asset 
and debt identities constructed from balance sheets of firms and households. 
One identity is solved for total wares, and another for total profits. Certain 
novel statements which constitute Boulding’s theory of distribution appear 
to proceed from unfortunate use of the “other things being equal’’ assump- 
tion. If two variables are independent of each other, it is legitimate to study 
the effect of one while assuming constancy in the other. But Boulding uses 
the assumption where the variables concerned are not independent of each 
other. For instance: the relation of total profits to total wages depends partly 
on a transfer factor that includes both the balance of payments of house- 
holds to businesses and the indebtedness of businesses to households. We 
are told that “... the sale of bonds by businesses to households actually 
diminishes total profits, the other variables [in the transfer factor] being 
constant.” (p. 238). Boulding writes this when he is considering the indebted- 
ness of businesses to households; but the sale of bonds will change the 
balance of payments from households to businesses as much as it changes the 
indebtedness of businesses to households. Therefore it is futile to discuss the 
effect of bond sales through the indebtedness item, making “other things 
equal.” This objection is raised by McKean’s comments. Boulding states 
that he accepts some of McKean’s objections, but does not indicate his reac- 
tion to this particular one. It appears that accepting it would virtually 
eliminate Boulding’s outline of the macroeconomic theory of distribution. 
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This volume contains material that is likely to be useful in further research 
on problems in which wealth is relevant. However, a conference on method- 
ology might well precede further attempts at measurement. 


Labor Force, Employment, and Unemployment. Annual Estimates by States; 
1900-1940. John P. Herring. Seattle, Washington: University of Washington 
Press, 1951. Pp. iii, 55. Paper. 


Cepric Wotre, Metropolitan Life Insurance Company 


5 pen study attempts to present, for ali States on a uniform basis, con- 
secutive annual estimates from 1900 to 1940 for the three variables listed 
in the title. It uses the decennial Census figures on gainful workers as bench- 
marks for labor force by States—figures which are recognized by the author 
as not comparable in concept from Census year to Census year. 

Labor force in inter-Censal years for each State is interpolated by use of 
annual estimates made by the National Industrial Conference Board for 
total United States. The latter national estimates of labor force are them- 
selves interpolations between decennial Censuses, but some refinements had 
been used by the Board, e.g., taking account of changes in composition of 
population during intervening years, fluctuation in schoo! population, mor- 
tality, and net immigration. 

Thus, as the author recognizes, his estimates for each State purport to 
measure the hypothetical “normal” labor force instead of the “actual” 
labor force in each year. As a result of this, he produces “negative” unem- 
ployment at times in all States, because his estimated employment some- 
times exceeds labor force, just as was the case with the national figures of 
the Conference Board. In modern definitions of labor force, the latter always 
exceeds employment. Furthermore, plausible changes are recorded, e.g., 
uneven annual movements and occasional shifts in direction, owing to such 
factors as boom and depression and total War. Locally, i.e., among individual 
States, such movements are doubtless even more pronounced. The author’s 
estimates of labor force cannot reflect such changes, through no fault of his, 
of course. The basic data just do not exist. 

In making estimates of employment by States back to 1900, Herring is 
skating on thin ice, in the opinion of this reviewer. In the earlier years, he 
estimates each State’s employment by applying to his estimates of labor 
force for that State the United States ratio of employment to labor force, 
derived also from the above-mentioned Conference Board series. Since 
national estimates of actual number of persons employed were necessarily 
cruder during this period than in the later years covered by Herring, such 
U. 8. ratios are doubtless subject to whatever errors are embodied in their 
numerators, let alone any imperfections in their denominators. Of course, 
he makes the customary adjustments to whatever Census bench marks are 
available for State employment in the later Census years. It can be seen 
readily that the year-to-year changes in each State’s employment are influ- 
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enced almost entirely by the fluctuations in the Conference Board’s national 
employment total. 

If such State figures for employment appear to be good, it is naturally 
because national cyclical fluctuations, particularly the more distinct and 
severe ones, are common to many individual States, or the national] total 
would not have moved cyclically in the first place. This reviewer would not 
care to rely on Herring’s estimates for the purpose of determining whether 
employment in a particular State actually did rise or decline between two 
consecutive years. Similarly, one could not be sure that the years in which 
his tables indicate a cyclical peak or trough for a State were the exact dates 
when those events actually did take place. Nor would the reviewer have 
confidence in any table comparing between two dates the extent of change, 
percentagewise, among the various States. (If a certain State showed the 
sharpest or slightest decline between a cyclical peak and trough, could one 
be reasonably sure that this had actually happened?) 

Obviously, in light of the above, the reviewer is completely skeptical of the 
unemployment estimates produced for each State in this study. And not 
alone on the ground that they are derived as a residual—labor force minus 
employment—hence subject to greater error than either of these two larger 
magnitudes (a point with which the author does not agree). The reviewer 
cannot rid himself of the feeling that changes in comparability of labor 
force coverage from Census to Census must have considerably affected 
the accuracy of the derived unemployment estimates—much more than they 
affected either employment or labor force itself. 

In anticipation of criticism, Herring notes in his booklet practically all 
the objections that can be raised to the techniques and data used by him, 
and he also mounts a powerful defense on some of these. Very frequently in 
his study he emphasizes that the annual figures produced by him are better 
than having no State estimates at all! 


The Nature and Tax Treatment of Capital Gains and Losses. Lawrence H. Selt- 
zer (Professor of Economics, Wayne University) with the assistance of Selma F. 
Goldsmith and M. Slade Kendrick. New York: National Bureau of Economic Re- 
search, Inc., 1951. Pp. xxii, 554. $7.50. 


Gerorce G. Haceporn, National Association of Manufacturers 


HIs work should immediately assume the status of the basic source of 
5 or trewn den for economists, on the subject of capital gains. It is thorough, 
objective, and authoritative. Although its chief use will be as a reference 
work, the text is thought-provoking and surprisingly readable. 

Mr. Seltzer’s book seems to this reviewer to be an ideal model for this 
type of economic treatise. The author examines all sides of each question 
and does not unduly press his interpretations upon the reader. At the same 
time he has not produced the sterile and formless type of research report 
in which conclusions are avoided and no point of view is assumed. 
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For example, a chapter is devoted to the question: “Are Capital Gains 
Appropriate Slements of Taxable Income?” Professor Seltzer discusses the 
arguments for believing that capital gains represent less taxpaying capacity 
than other income and decides that they “... require not so much the 
complete exclusion from taxable income of all sporadic receipts and losses 
as some effective provision for averaging them over the incomes of several 
years.” In the same chapter the author points out that a tax on capital 
gains is not really a tax on capital, as some allege, since it “ . . . cannot liter- 
ally be paid out of the existing real capital of society as a whole, and there- 
fore cannot directly reduce that capital.” However, “There is ground for 
believing that taxes on capital gains tend to absorb potential current savings 
to a greater degree than taxes on ordinary components of income.” Further 
on it is pointed out that “... the unimpe’*d movements of various kinds 
of property holdings . . . is not only in the interest of the individual but in 
that of society as a whole....The tax on realized capital gains may be 
charged, therefore, with preventing the optimum use of capital assets.” 
These brief quotations can merely suggest the many considerations which 
are brought up and carefully considered in this one chapter. 

Another important question trested in the book is whether the revenue 
yielded from capital gains taxation would be increased if the tax rate were 
lowered. It is argued in some quarters that if the tax rate on capital gains 
were lowered, individuals would be less reluctant to realize such gains, and 
the increase in the amount realized would more than offset the rate reduction. 
Mr. Seltzer finds little merit in this theory, since his statistics indicate that 
the amount of realized capital gains is far more sensitive to other economic 
factors than to changes in the tax rate. 

A chapter is devoted to a historical summary of the legal treatment of 
capital gains. Another chapter analyzes the fundamental economic nature 
of capital gains. Still another reports on their tax treatment in foreign 
countries. The final chapter discusses in detail the various proposals for tax 
treatment of capital gains and losses. It is without doubt the most complete 
and competent analysis ever made of the many controversial issues in this 
field. 

Some striking new facts are brought out in this book. Thus for the thirty 
year period 1917 to 1946 net capital gains realized by individuals filing in- 
come tax returns exceeded their net capital losses by only $16 billion. Trans- 
actions in common stocks of corporations are by far the most important 
sources of capital gains and losses, while real estate transactions are a com- 
paratively minor element in the picture. Since 1922 there has been a steady 
deciine in the importance of short term gains and losses as compared with 
long term. 

Statisticians will find little in the way of novel or complex methodology. 
The methods are, however, fully adequate. The statistical appendix con- 
tains 98 tables and can best be described as exhaustive. It presents practi- 
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cally of the available statistical information, classified in every way that 
could conceivably be of interest. The data are derived chiefly from compila- 
tions of the Bureau of Internal Revenue, published and unpublished. Much 
more is involved than merely transcribing figures from the sources. Gaps in 
the data and changes in the tax law make this a job for experts. Both the 
tabular material and the description of “Sources and Methods” should be 
invaluable for future students of the subject. 
There ought to be more basic economic studies of this quality. 


History and Policies of the Home Owners’ Loan Corporaticn. C. Lowell Harriss. 
New York: National Bureau of Economic Research, Inc., 1951. Pp. xix, 204. 
$3.00. 


GLENN H. Beyer, Cornell University 


HIS is the fourth in a series of studies completed under the Urban Real 
T Estate Finance Project of the Finance Research Program of the National 
Bureau of Economic Research. It presents a detailed historical description 
of the Home Owners’ Loan Corporation, which was established during the 
depression of the early 1930’s (1933) “to help families prevent the loss of 
their homes through foreclosure,” which made approximately one million 
loans amounting to around $3 billion, and which finally liquidated those 
loans at a slight profit. 

In addition to discussing such important matters as the characteristics 
of HOLC borrowers, properties and loans, and the foreclosures which 
were required, much attention is given to the administrative organization 
of the agency and how it operated. For example, one chapter is devoted to 
the problems of organization and staffing, another to policies and methods of 
property management, still others to loan servicing procedures, the agency’s 
appraisal policies, and finally the financial liquidation of the agency. 

We can be grateful that the National Bureau of Economic Research 
conducted this study as a part of its important series. The study is in a 
somewhat different category from those previously completed (and the two 
to follow) inasmuch as the others do not have a historical character; that is, 
they concern themselves with current ever-changing problems in the field 
of urban mortgage financing. Yet it was undoubtedly included because a 
few years ago the HOLC was one of the most important financing institutions 
in this field. Its history would only become more difficult to record in future 
years. 

The two chapters of the report contributing most to existing knowledge 
in the area of urban mortgage lending are Chapters 4 and 6. The former 
discusses “Characteristics of HOLC Borrowers, Properties, and Loans” 
and the latter “Foreclosures,” including the reasons for foreclosure and a 
description of factors affecting foreclosure experience. 

In order to obtain the data for these chapters it was necessary to draw 
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a restricted sample of one out of every thirty HOLC loan records in the 
original HOLC New York Region, including the states of Connecticut, 
New Jersey and New York. (Owing to the inaccessibility of records, only 
every sixtieth loan was sampled at one of three locations where loan records 
were stored, and the sample was inflated accordingly.) The report recognizes 
that a sample of all HOLC loans would have been more desirable, but “the 
location and organization of available basic records made it impractical 
for a sample with national coverage to be drawn.” It was necessary for the 
report to recognize that the New York region loans might have been “broadly 
similar in certain respects to the national total of loans but different in other 
important respects.” Therefore, what these two chapters report is the result 
of a study of a sample of loans in the three states, not the nation. 

Some of the important findings regarding the characteristics of the families 
applying ior loans are as follows: over half of the families had monthly 
incomes from $50 to $150 and two-thirds of the applicants were from 35 to 
55 years old. Regarding the characteristics of the property: most of the 
properties on which the loan application was made were less than 15 years 
old, one-third of the properties were used to some extent for business 
purposes, 87 per cent had central heating, and 84 per cent had the same num- 
ber of baths as families (probably to be generally interpreted as one). Among 
the important factors associated with the foreclosures were the following: 
foreclosure rates were higher among those families who were “over-housed” 
(defined as those circumstances where the value or cost of the dwelling is 
higher than the family’s economic position justifies); higher than average 
foreclosure rates were associated with (a) the younger and older borrowers, 
(b) properties having some business use, (c) properties of higher values, 
more rooms, and high ratios of land to total value, (d) loans for larger 
amounts, and (e) higher loan-to-value ratios. 

An example of the difficulties of analysis is provided in the discussion of 
the reasons for foreclosure. The report states, “The HOLC attributed 
nearly half of its foreclosures to non-cooperation of the borrower, although 
one cannot distinguish in all cases between this group and the two others of 
next greatest importance, namely, obstinate refusal to pay and total inability 
to pay.” It continues, “A borrower classed as ‘non-cooperative’ had, in the 
opinion of the HOLC agent, some chance of either carrying the loan or 
selling the property with some salvage of equity, but lacked the determina- 
tion to make the necessary efforts and sacrifices.” These types of sociological 
reasons would appear most difficult if not impossible to classify, and it ap- 
pears that a more precise answer to the reasons for foreclosures will not be 
forthcoming. 

The author limits his report quite definitely to the data and other infor- 
mation at hand and, unfortunately, does not offer to develop implications 
with respect to another possible decline in the building cycle, with the ex- 
ception of expressing the general hope that “some of HOLC’s experience 
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may be useful in improving one vitally important part of American life-— 
the financing of home purchases.” The recording of this agency’s administra- 
tive procedures and policies is excellent, but the reader must depend pri- 
marily on the conclusions described above, with some additions not of equal 
importance, to accomplish the author’s goal. 


Productivity, Supervision and Morale Among Railroad Workers. Daniel Katz, 
Nathan Maccoby, Gerald Gurin, and Lucretia G. Floor. Ann Arbor, Michigan: 
Survey Research Center, Institute for Social Research, 1951. Pp. xii, 61. 


Dae Yopver, University of Minnesota 


HIS is a type of report to which statisticians should give attention, in 
7 east because applications of statistical techniques to problems of man- 
power management are appearing with increasing frequency and in part 
because research in that field needs the constructive suggestions of statis- 
ticians. Empirical studies based on sampling employee and group produc- 
tivity, employee morale, attitudes of supervisors, and employment policies 
and practices are achieving increasing acceptance, both in this country and 
abroad. They are designed to substitute demonstrated functional relation- 
ships—for example, relationships of morale to productivity or of certain 
practices to morale and/or productivity—for older, widely accepted dogmas 
with respect to these relationships. 

Studies undertaken for this purpose face many complicated statistical 
problems. Measurement in the field of employment relationships is for the 
most part a novelty. Scales must be developed and validated. Criteria often 
involve complex indexes, with problems of weighting and cross-validation. 
In facing and solving these problems, statisticians have an important part 
to play. 

The study under consideration is one of a series of studies undertaken by 
the Survey Research Center to discover principles “which contribute both 
to the productivity of the group and to the satisfaction that the group mem- 
bers derive from their participation.” An earlier study considered relation- 
ships among productivity, supervision, and morale in the home office of the 
Prudential Insurance Company in Newark, New Jersey. 

The study reported here sought to discover similar principles as illustrated 
by the behavior and attitudes of section crews on the Pere Marquette 
District of the Chesapeake and Ohio Railroad. The problem was the extent 
to which levels of productivity of section gangs are related to their super- 
vision and to the morale of employees. 

The first chapter describes the design of the study. Pairs of section gangs 
were first selected on the basis of comparable technical factors affecting 
production—mileage, ballast, weight of rails, terrain and topography, and 
numbers in each work gang. As is generally the case in such studies, no 
satisfactory objective measure of productivity is available. To provide a 














574 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1952 


criterion, Division Engineers and Track Supervisors rated selected groups, 
indicating which of each pair was doing a better job. The report outlines 
comparisons of these “high” and “low” groups in the 36 paired section 
crews—a total of 298 employees. Chi-square tests were applied to each com- 
parison. Results are regarded as acceptable if they meet the five per cent 
level of significance. 

Chapter II reports findings with respect to productivity and supervision: 
Chapter III describes the relationship of productivity and employee morale; 
and Chapter IV compares these findings with those of the earlier Prudential 
Study of clerical employees. Four appendices describe the tools and tech- 
niques used in the study. 

The study reveals that while high and low foremen do not differ signifi- 
cantly in the degree of their job satisfaction and other aspects of the work 
situation, high foremen are more aware of their leadership role and spend 
more time supervising and planning work. The high foremen “are more 
positive toward their men, take a more personalized approach to them and 
give more attention to problems of their motivation.” So far as productivity 
and morale are concerned, employees in high and low section gangs do not 
show significant differences in satisfaction with the over-all work situation, 
the company, their job status or wages. But those in high section gangs 
regard their work groups more highly than those in low sections. And more 
members of the low sections show strong intrinsic job satisfaction. 

Most of these findings confirm the results of the Prudential study. In 
general, they indicate that effective supervisors assume more leadership, 
show greater concern for employees, and feel less pressure from higher 
echelons. Members of more productive groups evaluate their own teams more 
highly and show less intrinsic satisfaction with their jobs. 


The Development of Bank Debits and Clearings and Their Use in Economic 
Analysis. George Garvy. Washington: Board of Governors of the Federal Reserve 
System, 1952. Pp. viii, 175. 25 cents. Paper. 


Henry H. Vituarp, The City College, New York 


UBLISHED as a Federal Reserve technical paper, George Garvy’s study of 
tn measures of the use of money and of the contribution that they can 
make to economic analysis is the best available discussion of the meaning 
and usefulness of the series in question. Mr. Garvy starts with a comparison 
of clearings and debits statistics; the former comprises the total of checks 
passing through clearing houses including drafts on interbank accounts, 
while the latter excludes interbank accounts but includes checks cashed, 
collected directly, or drawn on accounts in the bank in which they are de- 
posited. Both series reflect the high level of financial transactions in New 
York City, but in general debits are more inclusive and representative, 
especially for smaller centers where clearing houses play an unimportant 
role. 
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The question of why debits exceed gross national product is next discussed. 
This occurs not only because the “main money circuit” may include several 
sales of output before it reaches the final consumer, but also because of 
“technical” or “money-to-money” transactions, which have been dubbed 
“fluff.” Though “fluff” has recently been decreasing in importance, it still 
amounts to half of “main circuit” transactions in the country as a whole and 
perhaps six times such transactions in New York City taken separately. 
Turning to cyclical behavior, Mr. Garvy concludes that c’earings corre- 
sponded well with aggregate economic activity from 1878 (when the series 
first became available) to World War I, but that thereafter even debits did 
not provide a satisfactory monthly chronology of movements in business. 
Efforts to relate debits to deposits in order to derive estimates of velocity 
are next reviewed; wide variations have been found to exist, with larger 
cities showing consistently higher turnover. 

Passing to applications, Mr. Garvy first discusses the uses of debits and 
velocity series by Mitchell, Schumpeter, and Frickey in their business cycle 
analyses, by Snyder and the Harvard Economic Service in economic fore- 
casting, and by various authors in efforts to measure regional business activ- 
ity. He then turns to the use of velocity indexes in monetary theory—par- 
ticularly in connection with the controversies over the quantity theory—re- 
viewing the work of Kemmerer, Fisher, Anderson, A. F. Burns, Snyder, 
Copeland, and Keynes. A final chapter is devoted to a summary of findings 
with respect to the adequacy of clearings and debits for the various uses to 
which they have been put, and to a number of suggestions for improving our 
present debits series. The study concludes with three appendices devoted, 
respectively, to a discussion of the New York clearings series (which goes 
back to 1853), to the coverage of the New York debits series, and to the rela- 
tion between clearings and debits in selected centers. 

While obviously not a book of wide ajy»eal, Mr. Garvy’s study is of the 
workmanlike quality we have come to expect of Federal Reserve publica- 
tions and will be much appreciated by students of monetary theory and 
behavior. 


De Juiste Maat, Lichaamsafmetingen van Nederlandse vrouwen als basis van 
een nieuw maatsysteem voor dames-confectiekleding (The Right Size, Body 
measurements of Dutch women as a basis for a new system of sizing ladies’ 
ready-to-wear clothing). J. Sittig (Consultation Bureau for Applied Statistics) 
and H. Freudenthal (Professor of Mathematics in the University of Utrecht). 
Leiden: L. Stafleu, 1951. Pp. 402. Fl. 20.—. 


H. S. Konisn, University of California (Berkeley) 


N THIS BOOK J. Sittig presents a detailed report on an investigation he con- 
ducted in order to arrive at a recommendation for standards in the sizing 
of women’s ready-to-wear garments in the Western part of the Netherlands, 
The volume also includes a “Popular Summary of Methods and Results” by 
Dr. Freudenthal, and a mathematical appendix by the latter. There is a 
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Summary, a list of tables, and a list of figures, all in English; little space need 
therefore be given in this review to a presentation of the conclusions of the 
report. The investigation was sponsored by the leading Dutch department 
store “De Bijenkorf.” 

The publication should be of interest to those concerned with (a) economy 
in production and distribution, particularly in the clothing and allied indus- 
tries; (b) anthropometry; (c) statistical methods; (d) methods of popular 
presentation of statistical investigations. 

(a) In many parts of Europe, as in the U.S.A., by far the greater part of 
clothing worn is bought ready made. It is therefore important that the stores 
carry an assortment of clothing such that most people may be readily fitted 
without high cost of alterations. In addition, standard sizing leads to sub- 
stantial savings in manufacturing and merchandizing. In the Netherlands, 
it has been felt that garments marked as being of a certain size vary con- 
siderably in actual measurements in the different dimensions (waist girth, 
back length, etc.) and in the relations between these dimensions, as between 
manufacturers as well as for any given manufacturer. The first part of this 
study confirms this feeling eloquently. 

The need for improvements in this respect, although not at all as pro- 
nounced, has also been felt in this country, where work along these lines was 
undertaken by the Department of Agriculture in cooperation with the 
Works Progress Administration.' 

(b) The American results are not applicable to the Netherlands because 
of substantial anthropometric differences between American and Dutch 
women and the far greater anthropometric homogeneity in the population 
of women from the Western Netherlands than in that of women from all 
over the U.S.A. The anthropometric methods used in the Netherlands differ- 
ed from those of the American study in the following respects: 

(1) It was, wisely, decided to limit the number of body dimensions (in- 
cluding weight) to 15, rather than observe 59 as was done in the American 
study, and to analyze the measurements obtained rather more fully, also in 
relation to characteristics of the sampled individuals other than body di- 
mensions, 

(2) Measurements were made on hands and feet (they accounted for 5 of 
the 15 body dimensions observed). 

(3) Measurements were taken on the clothed bodies (during a warm 
month), since a sample of women consenting to be measured unclothed or 
in foundation garments would most likely have been unrepresentative, and 
since determination of the proper size of an article of ready-to-wear clothing 
for a lady customer in a retail store is generally made on the basis of meas- 





1U. 8. Department of Agriculture, Miscellaneous Publication 454, Women's Measurements for 
Garment and Pattern Construction, by Ruth O’Brien and William C. Shelton, Government Printing 
Office, Washington, 1941, 73 pp. +appendix tables. 
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urements taken on the fully clothed body. For comparison, some data were 
obtained on unclothed and partly clothed individuals. 
(c) The statistical methods adopted in the study under review incor- 
porated the following improvements upon those followed by the American 
roup: 
° (1) The Dutch study adopted a number of additional devices, such as 
multivariate scatter diagrams and lists of likely kinds of administrative er- 


rors, to weed out and to some extent rectify gross errors in the observations. 


(2) While both the American and the Dutch studies obtained estimates 
of the error variances of measurement by a certain amount of replication,* 
only the Dutch study used these estimates to make the rather substantial 
corrections to the observed distributions and moments they implied. Unfor- 
tunately the report does not present the detailed data on the distribution of 
the errors of measurement on which a judgment regarding the validity of 
the adjustments made would have to depend, such as the representativeness 
of the sample, the shape of the distribution of these errors, their possible 
dependence on the magnitudes of the dimenisons that were ascertained by 
the operators, and the correlations among the errors of measuring different 
dimensions. 

(3) The, American statisticians tried to find a small number of dimensions 
which would most closely predict the others, in the sense of minimizing as 
many as possible of the observed residual variances for what they judged 
were the most important measurements. They observed that the relative 
importance of the different dimensions vary from garment to garment, the 
length of the upper arm, for example, being of no importance with regard 
to short-sleeved dresses; evidently, they preferred this implicit and vaguely 
defined system of weighting. In contrast, Sittig deals with the problem in 
more specific economic terms, and seeks to establish a small number of di- 
mensions as a basis for a sizing system that would minimize the total cost of 
alterations. He invited three groups of experts employed in the better class 
ladies’ ready-to-wear clothing industry to state tolerances for the various 
dimensions of the “average dress” and used an average of these figures, as 
well as a tolerance fifty per cent higher, to estimate total alteration cost for 
“close-fitting” and “standard fit” clothing, respectively, under various alter- 
native sizing systems. 

To the reviewer Sittig’s method seems clearly better for the problem at 
hand, although its restricted applicability to or optimality for articles of 
clothing deviating in fitting requirements from the “average dress” should 
have been brought out and made more precise. 

(4) The determination of a proposed series of standard sizes on the basis 





2 The diagram in the English summary states n =17 for these repiications. Here n represents the 
number of women on whom replicated measurements were performed; however, for the principal body 
dimensions the total number of measurements taken on these women amounted to 92. 
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mentioned above was also fairly arbitrary in the case of the American study, 
Both studies subdivided the population into equal intervals with respect to 
the dimensions chosen as a basis for the system,’ but the Dutch study 
chooses the widths of the intervals in accordance with the tolerances. 

(5) The Dutch study also discusses the sensitivity of its conclusions to 
certain modifications in the assumptions and sampling errors. It does not, 
however, present their cumulative effect. A ceriain amount of uncertainty 
as to the closeness of the various approximations used in deriving the cost of 
alterations is not entirely removed by the sensitivity computations and 
could have been investigated directly from the observed distributions as ad- 
justed for measurement errors. 

While the reviewer judges Sittig’s methods an improvement over those of 
his predecessors, he wonders why the author did not go a bit further in his 
approach to the prcblem as an economic one. Sittig proposes a one-dimen- 
sional and a two-dimensional standard, the latter giving a one-fourth to one- 
third reduction in the cost of alterations over the former, and remarks that 
basing standards on an additional dimension would bring about a further 
diminution, but only about half as much, of the cost of alterations while 
greatly increasing the number of sizes.‘ He gives no evidence of having con- 
sidered one- or two-dimensional systems where some but not all intervals or 
cells, respectively, are subdivided according to one or more additional di- 
mensions. From the data published it is not possible to ascertain the econo- 
mies such standards would achieve, but they may well be considerable, and 
that system is not unknown in manufacturing or merchandizing practice. 
Like the standards proposed by Sittig, not all subdivisions of such a standard 
series need to be produced by all manufacturers or carried in stock by all 
retailers. , y 

The following are minor comments. 

It is not clear to the reviewer why Sittig recommends standards for 
“standard fit” clothing with a tolerance twice that for “close-fitting” clothing 
in waist girth, and only one-and-one-half in the other dimensions. 

The notation does not always reflect clearly the distinction between popu- 
lation parameters and quantities computed from the sample. The factor 
1/100 on the bottom of page 116 should be 1/70. The r mentioned in the 
footnote on page 138 is not the same as the one mentioned in the text. 

The mathematical appendix adds little of any value to what is found in 
the main body of the report. An exception is the derivation of the formula on 
page 392; the closeness of this approximation is not, however, self-evident, 





3 It is interesting to observe that the aim of the American statisticians, as outlined under point 3 
above, would have been much more closely realized by uneven spacing; indeed, they noted this them- 
selves, but rejected it on grounds of simplicity and flexibility. But posing the problem as does Sittig 
brings out strong considerations which (even apart from simplicity or flexibility) favor spacing according 
to tolerances, and therefore at approximately equal intervals. 

4 The author ought also to have presented data on alteration cost for the case in which only 95% 
or 90% of the population would be provided with ready-to-wear clothing. 





BOOK REVIEWS 579 


but the reviewer found that a second term in the power series expansion would 
add very little numerically. The discussion of paragraph 10 of the appendix 
is intended to answer some questions concerning sizes to be stocked by retail 
stores, but is based on pretty .rbitrary assumptions; it is not indicated how 
the (otherwise rather evident) verbal conclusions presented in the last eight 
lines of page 400 have been drawn from the preceding methematical analysis 
which abstracts from the existence of sizes. The left-hand side of equation 
(1.6.1) should be 1—P,; the formula on the penultimate line of page 391 and 
the last formula on the next page contain misprints. 

(d) As mentioned before, the volume under review includes a self-con- 
tained, eighty page “Popular Summary of Methods and Results.” The task 
of reporting to a reader without statistical training the methods by which 
the conclusions were arrived at in language which he can readily understand 
and which will keep his attention is quite a challenging one. While the meas- 
ure in which Freudenthal has succeeded in this task will only be ascertaina- 
ble from interviews with such readers, this reviewer wishes to express his 
admiration for the way it has been tackled and to venture the guess that the 
text may even evoke in some readers the desire to learn more about statisti- 
cal methodology as such. 

That this kind of report should not always satisfy the usual requirements 
for exactitude of statement is almost unavoidable. Thus, the reader may ob- 
tain an exaggerated impression of the ubiquity of the normal law for phe- 
nomena caused by a large number of random causes from the discussion on 
page 28; and it is clearly misleading to denote the correlation coefficient as 
the mathematical measure for interrelation. (The main report, in explaining 
in general the meaning of a correlation coefficient on page 105, states incor- 
rectly that if this coefficient vanishes there is no relation between the varia- 
bles.) 

It is stated that the older term mean deviation means more to the layman 
than the term standard deviation and should therefore be used instead of the 
latter in popular discussions. The reviewer would question both statements, 
particularly the latter. Possibility of confusion with mean deviation in the 
technical sense remains, even if this terminology is infrequently encountered 
in recent publications. 

The care bestowed on form of presentation of methods and results in the 
entire volume is exemplary. 


Table of n! and r(n+1/2) for the First Thousand Values of n. Herbert E. 
Salzer. National Bureau of Standards Applied Mathematics Series 16. Wash- 
ington: U. 8S. Government Printing Office, 1951. Pp. iii, 10. 15 cents. 


Fe all positive integers n up to 1,000, this table shows n! to 16 significant 
figures. It also shows I'(n+1/2), which is the same as (n—1/2)!, to 


5 In connection with the assumptions and approximations on which the formula is based, see also 
the remark under point 5 above. 
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eight significant figures, for only seven of which absolute accuracy is guaran- 
teed. Since expressions of the form [(m+1)/2]! occur in the formulas for a 
number of important statistical distributions, these tables may find some 
use by statisticians, though the statistical distributions themselves will 
generally be found adequately tabulated. 

W. A. W. 
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